STRUCTURES OF SUBSTRATE BINDING POCKETS OF SCF COMPLEXES 

A portion of the disclosure of this patent document contains material that is subject to copyright protection. 
The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or patent 
5 disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright 
rights whatsoever. 
FIELD OF THE INVENTION 

The present invention relates to binding pockets of Skpl-Cdc53/Cullin-F-box protein (SCF) E3 ubiquitin 
ligases associated with substrate selection and/or orientation. In particular, the invention relates to a crystal comprising 
10 such binding pockets. The crystal may be useful for modeling and/or synthesizing mimetics of a binding pocket or 
ligands that associate with the binding pocket. Such mimetics or ligands may be capable of acting as modulators of 
the interactions of an SCF E3 ubiquitin ligase and its substrates, and they may be useful for treating, inhibiting, or 
preventing diseases modulated by such interactions. 

Methods are also provided for regulating an SCF E3 ubiquitin ligase comprising changing a binding pocket 
15 associated with substrate selection and/or orientation. 
BACKGROUND 

The ubiquitin proteolytic system controls the precisely timed degradation of regulatory proteins in signaling, 
development and cell cycle progression. Substrate ubiquitination is catalyzed by a cascade of enzymes, termed El, E2 
and E3, which activate and then conjugate ubiquitin to the substrate (Hershko and Ciechanover, 1998). E3 enzymes, 

20 also known as ubiquitin ligases, contain substrate-specific recognition domains and catalyze the final step in ubiquitin 
transfer. Recognition is mediated by primary sequence elements in the substrate, referred to as degrons (Varshavsky, 
1991). Control of the E3-substrate interaction forms the basis for regulated proteolysis; often post-translational 
substrate modification, most commonly phosphorylation, serves to target substrates to their cognate E3 enzymes 
(Deshaies, 1999). Two main classes of E3 enzyme are now evident, as characterized by the presence of either a HECT 

25 domain or a RING domain. The HECT domain class forms a catalytically essential thioester with ubiquitin, whereas 
the RING domain class relies on the E2 enzymes to provide catalytic activity (Pickart, 2001). The RING domain 
forms an E2 docking site and orients the substrate with respect to the E2. 

Phosphorylation-dependent degrons direct substrates to a recently described class of multisubunit E3 
enzymes termed Skpl-Cdc53/CuIlin-F-box protein (SCF) complexes. SCF complexes are built on an invariant core 

30 machinery comprised of the adapter protein Skpl, the scaffold protein Cdc53 (called Cull in metazoans), and the 
RING-H2 domain protein Rbxl (also called Rocl or Hrtl), which interacts with an E2 enzyme, usually Cdc34 
(Pickart, 2001). Substrates are brought to the core complex by one of a large family of variable adapter subunits called 
F-box proteins, each of which targets a limited number of specific substrates (Bai et al., 1996, Patton et al., 1998). F- 
box proteins typically have a bipartite structure with an N-terminal -40 amino acid F-box motif and a C-terminal 

35 protein-protein interaction domain, such as WD40 repeats or leucine rich repeats, which bind substrates (Bai et al, 
1996; Feldman et al, 1997; Skowyra et al., 1997). The overall architecture of SCF complexes is conserved in several 
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related ubiquitin ligase complexes including the Anaphase Promoting Complex/Cyclosome and the Von Hippel 
Lindau (VHL) tumor suppressor protein complex, each of which contain cullin family members, RING-H2 domain 
and substrate recognition subunits (Pickart, 2001; Kaelin, 2002). 

Cell cycle progression depends on the precisely timed elimination of cyclins and cyclin-dependent kinase 
5 (CDK) inhibitors by the ubiquitin system (Harper et al, 2002). In yeast, Gl cyclin CDK activity phosphorylates a 
CDK inhibitor called Sicl, whose degradation is necessary for onset of B-type cyclin CDK activity and DNA 
replication (Schwob et al., 1994). Phospho-Sicl is specifically recognized by the F-box protein Cdc4, which recruits 
Sicl for ubiquitination by the Cdc34-SCF complex (Bai et al., 1996; Feldman et al., 1997; Skowyra et al., 1997). 
Stable forms of Sicl that lack CDK phosphorylation sites cause a Gl phase arrest (Verma et al., 1997), whereas 

10 deletion of SIC1 causes premature DNA replication and rampant genome instability (Lengronne and Schwob, 2002). 
Cdc4 recruits several other substrates to the SCF core complex in a phosphorylation dependent manner, including the 
Cln-Cdc28 inhibitor/cyto skeletal scaffold protein Farl, the replication protein Cdc6 and the transcription factor Gcn4 
(Patton et al., 1998). The F-box protein Grrl functions in an analogous manner to render Gl cyclins unstable 
throughout the cell cycle, in a manner that depends on recognition of phospho-epitopes by the LRR domain of Grrl 

15 (Skowyra et al, 1997; Hsiung et al, 2001). 

In the metazoan cell cycle, SCF complexes target phosphorylated forms of the CDK inhibitor p27 Kipl and 
cyclin E, among other substrates. Interestingly, F-box protein specificity for these substrates is reversed compared to 
yeast, in that the WD40 domain of hCdc4/Fbw7/Ago/SEL-10 recognizes cyclin E (Strohmaier et al., 2001; Koepp et 
al., 2001; Moberg et al., 2001), whereas the LRR domain of Skp2 recognizes p27 Kipl in conjunction with the CDK- 

20 binding protein Cksl (Harper, 2001). Both of these degradation pathways are perturbed in cancer cells. Many primary 
tumors express high levels of Skp2, which leads to premature degradation of p27 Kipl and cell cycle entry (Harper, 
2001). Conversely, loss of Cdc4 function causes deregulation of cyclin E-CDK2 activity, which leads to precocious S 
phase entry and genome instability (Spruck et al., 1999). Mutations in the Drosophila homolog of CDC4, called ago, 
were isolated as homozygous recessive alleles in a screen for excess cell proliferation, a defect attributed to ectopic 

25 cyclin E activity (Moberg et al., 2001). Mutations in hCDC4 have been detected in several cancer cell lines that 
exhibit high levels of cyclin E (Moberg et al., 2001; Strohmaier et al., 2001), as well as in a significant fraction of 
primary endometrial cancers (Spruck et al., 2002). In addition, hCDC4 is located in the 4q32region, which is often 
deleted in various cancers (Spruck et al., 2002). Significantly, a high level of cyclin E correlates strongly with low 
survival rates in breast cancer (Keyomarsi et al., 2002). Other important substrates appear to be targeted for 

30 degradation by Cdc4 orthologs in a phosphorylation-dependent manner, including actived forms of the developmental 
regulator Notch and the presenilins, which are implicated in familial early onset Alzheimer's disease (Lai, 2002; 
Selkoe, 2001). SCF-dependent proteolysis also mediates other important signaling events, including phosphorylation- 
dependent degradation of the NFkB inhibitor IkBcc and the proto-oncogene product P-catenin by the F-box protein (3- 
TrCP (Pickart, 2001). 
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Several F-box proteins can recognize short phosphopeptide motifs that correspond to substrate sequences. 
However, it is unknown whether such interactions are analogous to phosphorylation-dependent interactions of SH2, 
PTB, 14-3-3, WW and FHA domains, each of which has been crystallized with its cognate phosphopeptide (Yaffe and 
Elia, 2001). For many SCF substrates, including Sicl, Cdc6 and Cln2, phosphorylation on multiple dispersed sites is 
5 required for recognition and degradation (Patton et al., 1998). We recently defined a high affinity consensus 
phosphopeptide binding motif for Cdc4, termed the Cdc4 phospho-degron (CPD), which bears the consensus I/L- 
I/L/P-pT-P-<KR> 4 [SEQ ID NO:l], where < > indicates a disallowed residue (Nash et al., 2001). The P0 phospho- 
threonine residue, or less favorably a phospho-serine residue, and the P+l proline are essential for interaction with 
Cdc4. Unexpectedly, the CPD consensus is at odds with the CDK phosphorylation site consensus, S/T-P-X-K/R [SEQ 

10 ID NO:2](Endicott et al., 1999). Thus, substrate recognition by the targeting kinase is counter-balanced against the 
targeting component of the degradation machinery. All nine CPD sites in Sicl have one or more sub-optimal features: 
all lack consensus hydrophobic residues in the P-l or P-2 positions, four have serine in place of threonine in the P0 
position, and seven contain a disfavored basic residue in one of the +2 to +5 positions. Unexpectedly, Sicl must be 
phosphorylated on at least six of its nine sites in order to allow recognition by Cdc4 (Nash et al., 2001). This 

15 requirement for multi-site phosphorylation in principle renders the rate of Sicl degradation proportional to the sixth 
power of Gl CDK concentration (Ferrell, 1996). The inherently ultrasensitive nature of the Sicl degradation reaction 
appears critical for the coordinated initiation of DNA replication by S phase CDK activity (Nash et al., 2001; 
Lengronne and Schwob, 2002). 

The mechanism of the ubiquitin conjugation reaction is not well understood. The ability of E2-E3 enzyme 

20 complexes to form polymers of ubiquitin, itself an 8 kDa protein, on a protein substrate presumably demands a large 
catalytic cradle simply to accommodate the initial reactants (Pickart, 2001). The sequential addition of ubiquitin 
moieties onto the substrate must also entail considerable flexibility of the substrate and/or the enzyme complex in 
order to extend the ubiquitin chain. Recent structure determination and modeling of three E2-E3 complexes has 
provided insight into these issues. A complex of the E2 enzyme UbcH7 and the HECT domain enzyme E6AP reveals 

25 a distance of - 50A between the E2 and E3 active sites, suggesting that catalytic transfer of ubiquitin requires large 
scale movements in an as yet undefined process (Huang et al., 1999). Similarly, a complex between UbcH7 and the 
RING domain E3 c-Cbl contains a substantial gap between the E2 active site and the substrate binding site on c-Cbl 
(Zheng et al., 2000). Structures of the SOCS-box adapter protein VHL in complex with a hydroxylated substrate 
peptide have recently been solved (Kaelin, 2002), but the orientation of the substrate binding site with respect to the 

30 E2 enzyme is unknown. Finally, structure determination and molecular modeling of the holo-SCF Skp2 complex again 
suggests a distance of - 50A between the substrate binding LRR domain in Skp2 and the E2 active site (Zheng et al., 
2002). Notably, the extensive interdigitation of the Skpl-Skp2 interface and the Skp2 inter-domain interface rigidly 
fixes the orientation of the LRRs of Skp2, suggesting that the F-box protein might hold the substrate in a very precise 
orientation with respect to the E2 enzyme (Schulman et al., 2000). However, because the substrate binding site on 
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Skp2 has not been determined, either by mutation or by co-crystallization with substrate peptide, it is not possible to 

deduce how SCF substrates might be positioned with respect to the E2 catalytic site. 

SUMMARY OF THE INVENTION 

Applicants have determined the structures of binding pockets of SCF E3 ubiquitin ligases involved in 
5 substrate recognition and/or orientation. More particularly, Applicants have solved the x-ray crystal structure of 

binding pockets of F-box proteins/ F-box protein-Skpl complexes of SCF E3 ubiquitin ligases that interact with Cdc4 

pjiospho-degron (CPD) motifs. 

Solving the crystal structure has enabled the determination of key structural features of substrate binding 

pockets of a SCF E3 ubiquitin ligase, particularly the shape of binding pockets, or parts thereof, that permit 
10 association of a substrate with a SCF E3 ubiquitin ligase or part thereof. The crystal structure also enables the 

determination of key structural features in substrates or ligands that interact or associate with the binding pockets. 

Knowledge of the structural features of substrate binding pockets of a SCF E3 ubiquitin ligase is of 

significant utility in drug discovery. The SCF E3 ubiquitin ligase substrate interaction is the basis of many biological 

mechanisms. In particular it is the basis for regulated ubiquitin proteolysis resulting in degradation of regulatory 
15 proteins involved in signaling, development, and cell cycle progression. In addition, drugs may exert their effects 

through association with the binding pockets of SCF E3 ubiquitin ligases. The associations may occur with all or any 

parts of a binding pocket. An understanding of the association of a drug with binding pockets of SCF E3 ubiquitin 

ligases will lead to the design and optimization of drugs having more favorable associations with their targets and thus 

provide improved biological effects. Therefore, information about the shape and structure of substrate binding pockets 
20 of SCF E3 ubiquitin ligases is invaluable in designing potential modulators of the SCF E3 ubiquitin ligases for use in 

treating diseases and conditions associated with or modulated by the SCF ubiquitin ligases, including cancer and 

Alzheimer's Disease. 

The present invention relates to an isolated binding pocket of an SCF E3 ubiquitin ligase involved in 
substrate recognition and/or orientation. In an embodiment, the invention relates to a binding pocket of an F-box 
25 protein/F-box protein-Skpl complex of a SCF E3 ubiquitin ligase that interacts with a Cdc4 phospho-degron (CPD) 
motif In an aspect of the invention, the binding pocket regulates the binding of a CPD motif to a SCF E3 ubiquitin 
ligase. 

In an embodiment, the invention comprises the structure of a WD repeat domain of an F-box protein. The 
structure may also comprise a helical linker of an F-box protein and optionally an F-box domain of an F-box protein. 
30 Still further the structure may comprise a Skpl protein. 

The invention also relates to a crystal comprising a binding pocket of a SCF E3 ubiquitin ligase involved in 
substrate recognition and/or orientation. 

In an embodiment, the invention provides a crystal comprising a WD repeat domain of an F-box protein. The 
crystal may also comprise a helical linker of an F-box protein and optionally an F-box domain of an F-box protein, 
35 Still further the crystal may comprise a Skpl protein. 
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The present invention also contemplates molecules or molecular complexes that comprise all or parts of 
either one or more binding pockets of the invention, or homologs of these binding pockets that have similar structure 
and shape. 

The invention also contemplates a crystal comprising a binding pocket of a SCF E3 ubiquitin ligase involved 
5 with substrate recognition and/or orientation in association with a substrate (e.g. CPD motif). A substrate may be 
complexed or associated with a binding pocket. The invention further contemplates a crystal comprising a binding 
pocket of a SCF E3 ubiquitin ligase involved with substrate recognition and/or orientation in association with a ligand. 
A ligand may be a modulator of the activity of a SCF E3 ubiquitin ligase. A ligand may be complexed or associated 
with a binding pocket 

10 In an aspect the invention contemplates a crystal comprising a binding pocket of an SCF E3 ubiquitin ligase 

involved in substrate recognition and/or orientation complexed with a substrate from which it is possible to derive 
structural data for the substrate. 

The shape and structure of a binding pocket may be defined by selected atomic contacts in the pocket. In an 
embodiment, the binding pocket is defined by one or more atomic interactions or enzyme atomic contacts as set forth 
15 in Table 3 or Table 4. Each of the atomic interactions is defined in Table 3 or Table 4 by an atomic contact (more 
preferably, a specific atom where indicated) on the F-box protein and by an atomic contact (more preferably a specific 
atom where indicated) on the substrate. The atomic interactions are also defined by an atomic contact on one portion 
of the F-box protein and an atomic contact on another portion of the F-box protein. 

An isolated polypeptide comprising a binding pocket with the shape and structure of a binding pocket 
20 described herein is also within the scope of the invention. 

The invention also provides a method for preparing a crystal of the invention, preferably a crystal of a 
binding pocket of an SCF E3 ubiquitin ligase involved in substrate recognition and/or orientation, or a complex of 
such a binding pocket and a substrate. 

Crystal structures of the invention enable a model to be produced for a binding pocket of the invention, or 
25 complexes or parts thereof. The models will provide structural information about the interactions of a substrate or 
ligand with a binding pocket. Models may also be produced for substrates and ligands. A model and/or the crystal 
structure of the present invention may be stored on a computer-readable medium. 

The present invention includes a model of a binding pocket of the present invention that substantially 
represents the structural coordinates specified in Table 6 or portions thereof The invention also includes a model that 
30 comprises modifications of the structure substantially represented by the structural coordinates specified in Table 6. A 
model is a representation or image that predicts the actual structure of the binding pocket. As such, a model is a tool 
that can be used to probe the relationship between a binding pocket's structure and function at the atomic level, and to 
design molecules that can modulate the binding site and accordingly activity of an F-box protein or SCF complex. 

Thus, the invention provides a model of: (a) a binding pocket of an SCF E3 ubiquitin ligase involved in 
35 substrate recognition and/or orientation; and (b) a modification of the model of (a). 
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A method is also provided for producing a model of the invention representing a binding pocket of an SCF 
E3 ubiquitin ligase involved in substrate recognition and/or orientation, comprising representing amino acids of the 
binding pocket at substantially the structural coordinates specified in Table 6. 

A crystal and/or model of the invention may be used in a method of determining the secondary and/or tertiary 
5 structures of a polypeptide or binding pocket with incompletely characterised structure. Thus, a method is provided 
for determining at least a portion of the secondary and/or tertiary structure of molecules or molecular complexes 
which contain at least some structurally similar features to a binding pocket of the invention. This is achieved by using 
at least some of the structural coordinates set out in Table 6. 

A crystal of the invention may be useful for designing, modeling, identifying, evaluating, and/or synthesizing 
10 mimetics of a binding pocket or ligands or substrates that associate with a binding pocket. Such mimetics or ligands 
may be capable of acting as modulators of an F-box protein or SCF E3 ubiquitin ligase activity, and they may be 
useful for treating, inhibiting, or preventing diseases modulated by such a protein or ligase. 

Thus, the present invention contemplates a method of identifying a modulator of a F-box protein or an SCF 
E3 ubiquitin ligase comprising the step of applying the structural coordinates of a binding pocket, or atomic 
15 interactions, or atomic contacts of a binding pocket, to computationally evaluate a test iigand or substrate for its ability 
to associate with the binding pocket, or part thereof. Use of the structural coordinates of a binding pocket, or atomic 
interactions, or atomic contacts of a binding pocket to design or identify a modulator is also provided. 

In an embodiment, the invention contemplates a method of identifying a modulator of an F-box protein or an 
SCF E3 ubiquitin ligase comprising determining if a test agent inhibits or potentiates the interaction of an F-box 
20 protein or SCF E3 ubiquitin ligase with its substrate. 

The invention further contemplates classes of modulators of F-box proteins or SCF E3 ubiquitin ligases 
based on the shape and structure of a ligand or substrate defined in relation to the molecule's spatial association with a 
binding pocket of the invention. Generally, a method is provided for designing potential inhibitors of an F-box 
protein-substrate interaction or SCF E3 ubiquitin ligase-substrate interaction comprising the step of applying the 
25 structural coordinates of a substrate or ligand defined in relation to its spatial association with a binding pocket, or a 
part thereof, to generate a compound that is capable of associating with the binding pocket 

It will be appreciated that a modulator of an F-box protein or SCF E3 ubiquitin ligase may be identified by 
generating an actual secondary or three-dimensional model of a binding pocket, synthesizing a compound, and 
examining the components to find whether the required interaction occurs. 
30 A potential modulator of an F-box protein or SCF E3 ubiquitin ligase identified by a method of the present 

invention may be confirmed as a modulator by synthesizing the compound, and testing its effect on the F-box protein 
or SCF E3 ubiquitin ligase in an assay. 

A modulator of the invention may be converted using customary methods into pharmaceutical compositions. 
A modulator may be formulated into a pharmaceutical composition containing a modulator either alone or together 
35 with other active substances. 
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Therefore, the methods of the invention for identifying modulators may comprise one or more of the 
following additional steps: 

(a) testing whether the modulator is a modulator of the activity of an F-box protein or an SCF E3 
ubiquitin ligase, preferably testing the activity of the modulator in cellular assays and animal model 

5 assays; 

(b) modifying the modulator; 

(c) optionally rerunning steps (a) or (b); and 

(d) preparing a pharmaceutical composition comprising the modulator. 

Steps (a), (b) (c) and (d) may be carried out in any order, at different points in time, and they need not be 
1 0 sequential. 

Still another aspect of the present invention provides a method of conducting a drug discovery business 
comprising: 

(a) providing one or more systems employing the atomic interactions, atomic contacts, or structural 
coordinates of a binding pocket of an F-box protein or SCF E3 ubiquitin ligase involved in substrate 

15 recognition and/or orientation, for identifying agents by their ability to inhibit or potentiate the 

atomic interactions or atomic contacts of a binding pocket; 

(b) conducting therapeutic profiling of agents identified in step (a), or further analogs thereof, for 
efficacy and toxicity in animals; and 

(c) formulating a pharmaceutical preparation including one or more agents identified in step (b) as 
20 having an acceptable therapeutic profile. 

A further aspect of the present invention provides a method of conducting a drug discovery business 
comprising: 

(a) providing one or more systems for identifying agents by their ability to inhibit or potentiate the 
interaction between an F-box protein or SCF complex and its substrate; and 
25 (b) conducting therapeutic profiling of agents identified in step (a), or further analogs thereof, for 

efficacy and toxicity in animals; and 
(c) formulating a pharmaceutical preparation including one or more agents identified in step (b) as 

having an acceptable therapeutic profile. 
In certain embodiments, the subject methods can also include a step of establishing a distribution system for 
30 distributing the pharmaceutical preparation for sale, and may optionally include establishing a sales group for 
marketing the pharmaceutical preparation. 

Yet another aspect of the invention provides a method of conducting a target discovery business comprising: 
(a) providing one or more systems employing the atomic interactions, atomic contacts, or structural 
coordinates of a binding pocket of an F-box protein or SCF complex involved in substrate 



8 



recognition and/or orientation, for identifying agents by their ability to inhibit or potentiate the 
atomic interactions or atomic contacts; 

(b) (optionally) conducting therapeutic profiling of agents identified in step (a) for efficacy and toxicity 
in animals; and 

(c) licensing, to a third party, the rights for further drug development and/or sales for agents identified 
in step (a), or analogs thereof. 

Methods are also provided for regulating an F-box protein - substrate interaction or an SCF E3 ubiquitin 
ligase-substrate interaction by changing a binding pocket involved in substrate recognition and/or orientation. A 
binding pocket may be changed by altering amino acid residues forming the binding pocket (e.g. introducing 
mutations) or using a modulator. 

The invention also contemplates a method of treating or preventing a condition or disease associated with an 
F-box protein or an SCF E3 ubiquitin ligase in a cellular organism, comprising: 

(a) administering a modulator of the invention in an acceptable pharmaceutical preparation; and 

(b) potentiating or inhibiting the F-box protein or SCF E3 ubiquitin ligase to treat or prevent the 
disease. 

In an embodiment the condition or disease is cancer or Alzheimer's disease. 

The invention provides for the use of a modulator identified by the methods of the invention in the 
preparation of a medicament to treat or prevent a disease in a cellular organism. Use of modulators of the invention to 
manufacture a medicament is also provided. 

These and other aspects of the present invention will become evident upon reference to the following detailed 
description and Tables, and attached drawings. 
DESCRIPTION OF THE DRAWINGS AND TABLES 

The present invention will now be described only by way of example, in which reference will be made to the 
following Figures: 

Figure 1 shows structure based sequence alignments of (A) Skpl orthologs and (B) Cdc4 orthologs (red) and 
paralogs (black). Human Fbw7 and p-TrCPl are isoforms 1 and 2, respectively. Secondary structure elements are 
colored as in Figure 2A. Disordered regions in the crystal structure are shown as dashed lines. Red residues are 
essential for the Cdc4 function, blue residues strongly influence but do not abrogate function, green residues are non- 
essential but conserved around the binding pocket, and yellow residues are conserved elsewhere. Circles indicate 
mutations associated with excessive cell proliferation in flies and/or cancer in humans. Deletion of residues 37-64 in 
Skpl is denoted by a triangle and a replacement of two closely placed loops from residues 602-605 and 609-624 is 
denoted by the underline of the short interloop sequence Gly-Glu-Leu. Insertions to optimize sequence alignments are 
indicated by number of residues inserted in gray. The non-standard P-strand element 9 1 in ScCdc4 is marked by the 
red asterisk and is shown in full at the bottom of the alignment. Residues that anchor helix oc6 to the F-box domain are 
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marked by green hearts, those that anchor helix ct6 to the WD40 domain by red hearts and those that make direct 
contact between the WD40 domain and F-box domain by blue asterisks. [SEQ ID NOs 3-16.] 

Figure 2 shows an overview of the Skpl-Cdc4-CPD complex. (A) Ribbon representation of Skpl and the F- 
box domain (274-319), the helical linker region (331-366), and the WD40 domain of Cdc4 (367-744) coloured green, 
5 red, pink, and blue, respectively. The bound cyclin E derived CPD peptide is shown in purple with the phospho- 
threonine moiety shown in ball and stick representation. Secondary structure elements are indicated. Positions of 
disordered loop regions are shown as ribbon breaks. All ribbons representations were generated using Ribbons. (B) 
Ribbons representation highlighting the WD 40 domain of Cdc4. (3 propeller blades are denoted PB1 to PB8, and the 
component secondary structure elements are indicated. Ribbons and CPD peptide are coloured as in (A). Position of 
10 the WD40 domain is identical to that in Figures 4 A to 4C. (C) The structured linkage between the WD40 domain and 
the F box domain of Cdc4. 

Figure 3 shows an overview of the CPD binding region of the Ccdc4 WD40 repeat domain. (A) Molecular 
surface representation of the CPD binding pocket indicating invariant and highly conserved residues. Basic, 
hydrophobic and small residues are coloured blue, green and orange respectively. The bound CPD is shown in ball 

15 and stick representation with carbon, nitrogen, oxygen and phosphorous atoms coloured white, blue, red and yellow 
respectively. All surface representation were generated using Grasp. (B) Surface representation of CPD binding region 
as oriented in (A) coloured according to electrostatic potential. Blue and red indicate regions of positive and negative 
potential respectively (10 to -10 k B T). Residues of the bound CPD are labeled. (C) Stereo ribbons representation 
highlighting side chains and molecular interactions in the CPD binding pocket. CPD residues and highly conserved 

20 and invariant Cdc4 residues are displayed in ball and stick representation. Sites of mutation that give rise to severe 
loss of function are coloured red, and intermediate loss of function are coloured yellow (see Table 5). All other highly 
conserved and invariant residues are coloured green. Reference propeller blades of the WD40 repeat domain are 
indicated. (D) Stereo ribbons representation of the CPD binding pocket highlighting cancer causing mutations in 
drosophila and human Cdc4 orthologues. Arginine mutations in H-cell lines or entrometrial cells are coloured red. 

25 Drosophila mutations are coloured blue and Cdc4 temperature sensitive mutations (Rosamond personal 
communication) are coloured yellow. (E) Multiple Anomalous Dispersion phased electron density map corresponding 
to the CPD bound to the WD40 repeat domain of Cdc4. Refined CPD model is shown in ball and stick representation. 
Figure generate using O. (F) Schematic of CPD binding pocket interactions with the CPD peptide. 

Figure 4 shows (A) Stereo ribbons representation of the human Skpl-Skp2 complex superimposed on the 

30 yeast Skpl-Cdc4-CPD complex. Human Skpl-Skp2 and yeast Skpl-Cdc4 were superimposed through a least squares 
optimization of Skpl(3 strands 1 to 3 and a-helices 1 to 6 (RMSD = 0.74A). The yeast Skpl-Cdc4 complex is 
coloured as in Figure 2. Human Skpl, the Skp2 F-box, and the Skp2 Leucine-rich repeat domain are coloured orange, 
green, and light blue, respectively. Skpl and F box secondary structure elements that deviate significantly in size and 
position between the two structures are labeled. (B) Model of the SCF Cdc4 " CPD E2 complex. The yeast Skpl-Cdc4-CPD 

35 complex is coloured as in Figure 2. Cull, Rbxl, and E2 proteins are coloured pink, red, and light blue, respectively. 
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The arrow indicates the distance between the peptide binding site and the active site cysteine of the E2. The structure 
was generated using the ternary complex of the cullin cdc53, rbxl, Skpl, previously reported, and superimposing the 
E2 structure from the E2/Cb! ring finger structure and the structure of Skpl, Cdc4 and a phosphorylated CPD peptide 
Figure 5 shows (A) Selection of Sicl phosphoisoforms by wild type and mutant forms of Cdc4. (B) In vitro 
5 ubiquitination of Sicl isoforms by wild type and mutant SCFCdc4 complexes. (C) Natural CPD sites deviate from the 
optimal CPD by one or more or more residues. 

Figure 6 shows substrate orientation within the Skpl-Cdc4-CPD complex.(A) Comparison of the ScSkpl- 
ScCdc4-CPD complex and the hSkpl-hSkp2 complex. Complexes were superimposed through a least squares 
optimization of Skpl p-strands I to 3 and a-helices 1 to 6 (RMSD Ca= 0.74A). Skpl and F-box secondary structure 

10 elements that deviate significantly in size and position between the two structures are labeled. (B) Model of the 
ubiquitin-E2-SCF Cdc4 -CPD complex. The arrow indicates the 59A distance separating the phosphate group of the 
CPD and the active site cysteine of the E2. 

Figure 7 shows the CPD binding pocket of the WD40 domain. (A) Surface representation of the CPD binding 
pocket indicating invariant and highly conserved residues. Basic (blue), hydrophobic (green) and small polar residues 

15 (orange) are shown. The bound CPD is in ball and stick representation with carbon (white), nitrogen (blue), oxygen 
(red) and phosphorous (yellow) atoms shown. (B) Surface representation of CPD binding region indicating 
electrostatic potential. Blue and red indicate regions of positive and negative potential, respectively, over the range 10 
to -10 k B T. (C) Stereo ribbons representation of side chains and molecular interactions in the CPD binding pocket. 
Highly conserved and invariant side chains of Cdc4 and the CPD are displayed in ball and stick representation. Sites 

20 of mutation that give rise to severe and intermediate loss of function (see Figure 8) are colored red and blue, 
respectively; non-essential residues are colored green. 
(D) Schematic of CPD binding pocket interactions with the CPD peptide. 

Figure 8 shows structure-guided mutational analysis of Cdc4. (A) Residues required for interaction of 
phospho-Sicl and Cdc4 in vitro. Sicl was phosphorylated with Cln2-Cdc28 kinase and captured onto resin loaded 

25 with either wild type or the indicated mutant forms of Skpl-Cdc4 complex. (B) Residues essential for Cdc4 function 
in vivo. Complementation of a cdc4A strain by the indicated alleles was assessed in a plasmid shuffle assay. The 
R485A, R467A and R534A mutations in Cdc4 have been previously shown to disrupt function in vivo (Nash et al., 
2001) and so are not shown. (C) Effect of Cdc4 mutations on sensitivity to increased SIC1 dosage. Strains bearing 
indicated CDC4 alleles were tested for sensitivity to overexpression of wild type SIC1 and a partially stabilized 

30 version, SIC l Thr33Va! from the GAL] promoter. Strains were incubated on galactose or glucose medium for 2 days at 
30°C. 

Figure 9 shows the modulation of the multisite requirement for phospho-Sicl-Cdc4 interaction. (A) All 
natural CPD sites in Sicl deviate from the CPD consensus. Underlined residues indicate sub-optimal residues at the P- 
1 and P-2 positions, boxed residues indicate sub-optimal basic residues at the P+2 to P+5 positions and asterisks 
35 indicate a sub-optimal pSer at the P0 position. (B) Capture of Sicl phospho-iso forms by wild type and mutant Cdc4. 
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Pools of differentially phosphorylated Sicl were captured on Skpl-Cdc4 resin, using either wild type or the indicated 
mutant forms of Cdc4 compromised for selection at the P-l position (V384N W717N) or the P+2 to P+5 positions 
(K402A R443D). The input and bound phospho-Sicl isoform pools were resolved by denaturing IEF-2D gel 
electrophoresis and visualized by anti-Sicl immunoblot. (C) Ubiquitination of phospho-Sicl isoforms by wild type 
5 and mutant SCF Cdc4 complexes. Pools of differently phosphorylated Sicl were incubated in solution with an equi- 
molar amount of the indicated SCF Cdc4 complexes, Cdc34, ubiquitin and ATP for 1 h at 30°C. Input and reaction 
products were separated and visualized as in (B). Arrows indicate the less phosphorylated forms of Sicl captured by 
Cdc4 selection mutants. Asterisk indicates more extensively ubiquitinated species (D) Possible interaction 
mechanisms for single site and multi-site dependent substrate binding to Cdc4, In a two-site cooperative interaction 

10 model (left), a primary high affinity CPD binding site acts in conjunction with a secondary weak CPD binding site. 
The free energy for the two interactions is additive and so the overall K d increases multiplicatively. In a single-site 
allovalent interaction (right), multiple low affinity CPD sites engage a single CPD binding site on Cdc4 in 
equilibrium. The high local concentration of CPD sites increases the probability of binding such that Sicl is unable to 
diffuse away from Cdc4 before re-binding occurs. The probability of re-binding increases as an exponential function 

15 of the number of CPD sites, thus accounting for the apparent cooperativity of the interaction. 

The present invention will now be described only by way of example, in which reference will be made to the 
following Tables: 

Table 1 shows data collection, structure determination and refinement statistics of a crystal of the invention. 
Table 2 shows data collection, structure determination and refinement statistics of a crystal of the invention. 
20 Table 3 shows intermolecular contacts in a binding pocket of the invention. 

Table 4 shows intermolecular contacts in a binding pocket of the invention. 

Table 5 shows mutant cdc4 polyppeptides of the invention. Mutational analysis of the CPD binding surface. 
Mutants were tested in vitro by ability to bind phosphorylated Sicl and then captured onto GST-Skpl/Cdc4 resin and 
detected with anti-sicl antibody. Mutants were tested in vivo by ability to degrade GAL1-SIC1 or various 
25 phosphorylation mutants. Sites are as follows: 3 = Thr 33, Thr 45, Ser 76; 4 = Thr 5, Thr 33, Thr 45, Ser 76; 5 = Thr 
2, Thr 5, Thr 33, Thr 45, Ser 76; 6 = Thr 2, Thr 5, Thr 33, Thr 45, Ser 69, Ser 76; 7 = Thr 2, Thr 5, Thr 33, Thr 45, Ser 
69, Ser 76, Ser 80. GAL1-SIC1 plasmids were transformed into a cdc4A strain containing a copy of CDC4 on a TRP1 
ARS CEN plasmid. Strains were incubated for 2 days at 30°C. 

Table 6 shows the structural coordinates of a binding pocket of the invention. 
30 In Table 6, from the left, the second column identifies the atom number; the third identifies the atom type; the 

fourth identifies the amino acid type; the sixth identifies the residue number; the seventh identifies the x coordinates; 
the eighth identifies the y coordinates; the ninth identifies the z coordinates; the tenth identifies the occupancy; and the 
eleventh identifies the temperature factor. 

Table 7 lists the oligonucleotides used in the studies described in the examples. 
35 Table 8 lists the plasmids used in the studies described in the examples. 
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DETAILED DESCRIPTION OF THE INVENTION 

In accordance with the present invention there may be employed conventional molecular biology, 
microbiology, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the 
literature. See for example, Sambrook, Fritsch, & Maniatis, Molecular Cloning: A Laboratory Manual, Second Edition 
5 (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y); DNA Cloning: A Practical Approach, 
Volumes I and II (D.N. Glover ed. 1985); Oligonucleotide Synthesis (M..J. Gait ed. 1984); Nucleic Acid 
Hybridization B.D. Hames & S.J. Higgins eds. (1985); Transcription and Translation B.D. Hames & S.J. Higgins eds 
(1984); Animal Cell Culture R.I. Freshney, ed. (1986); Immobilized Cells and enzymes IRL Press, (1986); and B. 
Perbal, A Practical Guide to Molecular Cloning (1984). 

10 GLOSSARY 

Abbreviations for amino acid residues are the standard 3-letter and/or 1 -letter codes used in the art to refer to 
one of the 20 common L-amino acids. Likewise abbreviations for nucleic acids are the standard codes used in the art. 

" Skpl-Cdc53/Cullin-F-box protein (SCF) E3 ubiquitin ligases" or "SCF complex" refers to a protein 
complex comprising the adaptor protein Skpl, the scaffold protein cdc53/cullin, a RING-H2 domain protein Rbxl 

15 (also called Rocl or Hrtl), and an F-box protein, which protein complex augments or otherwise facilitates the 
ubiquitination of a protein. In certain aspects of the present invention an SCF complex refers to a complex comprising 
Skpl and an F box protein or parts thereof. 

In the context of the present invention the term "F-box protein" refers to a protein comprising a characteristic 
structural motif called the F-box as described in Bai et al, (1996 Cell 86: 263-274) and a protein-protein interaction 

20 domain, in particular a WD40 repeat motif or domain. Examples of F-box Proteins include Cdc4 polypeptides, and 
homologs or portions thereof, preferably portions that interact with a CPD motif (e.g. WD repeat). 

A "WD40 repeat", "WD40 motif, or "WD repeat domain" is generally defined as a contiguous sequence of 
about 25 to 50 amino acids with relatively-well conserved sets of amino acids [i.e. Trp-Asp (WD)] at the ends (amino- 
and carboxyl- terminal) of the sequence. (For reviews see Neer EJ, Schmidt CJ, Nambudripad R & Smith TF: "The 

25 ancient regulatory-protein family of WD-repeat proteins," Nature 371, 297-300 (1994) PMID: 8090199; and Smith 
TF, Gaitatzes CG, Saxena K & Neer EJ: "The WD-repeat: a common architecture for diverse functions," TIBS 24, 
181-185 (1999) PMID: 10322433.) A WD repeat motif or domain can also be defined as a domain of an F-box protein 
that interacts with a CPD motif or like motif. 

Examples of WD -repeat-containing proteins are cdc4 polypeptides, Met30 homologues and orthologues (see 

30 for example, GenBank Accession No. P39014 or MT30 YEAST - SEQ ID NO. 17 ) and (3-TRCP homologues and 
orthologues (see for example, GenBank Accession No. NP_033901 - SEQ ID NO. 1 8). Other WD40 repeat-containing 
proteins will, however, be appreciated by those skilled in the art. A WD40-repeat protein also includes a part of the 
protein. A person skilled in the art may conduct searches to identify proteins that contain WD-40 repeats, in particular 
F-box proteins. For example, on-line databases such as GenBank or SwissProt can be searched, either with an entire 

35 sequence of a WD-40-containing protein, or with a consensus WD-40 repeat sequence. Various search algorithms 
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and/or programs may be used, including FASTA, BLAST or ENTREZ. FASTA and BLAST are available as a part of 
the GCG sequence analysis package (University of Wisconsin, Madison, Wis.). ENTREZ is available through the 
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, 
Md. The number of WD-40 repeats in a particular protein can range from two to more than eight. 
5 A "Cdc4 Phospho-Degron motif or "CPD motif is a motif that targets substrates for ubiquitination by SCF 

complexes. The motif can be defined by the consensus sequence X 2 -X 3 -pThr-Pro-X 4 (SEQ ID NO. 19), more 
particularly X 2 -X 3 -pThr-Pro-X 4 -X 5 -X 6 -X 7 (SEQ ID NO.20), wherein X 2 represents Leu, Pro, or He, preferably Leu or 
He; X 3 represents Leu, He, Val, or Pro, preferably He, Leu, or Pro; X 4 represents any amino acid except basic and 
bulky hydrophobic amino acids, preferably X 4 , X 5 and X 6 represent any amino acid except basic and bulky 

10 hydrophobic amino acids, preferably X 4 is any amino acid except Arg, Lys, Tyr, or Trp, more preferably X 4 is He, 
Val, Pro, or Gin, preferably X 5 and X 6 are any amino acid except Arg, Lys, or Tyr and more preferably X 5 is Gin, Leu, 
Met, Thr, or Glu, and X 6 is Gin, Ala, Thr, GIu, or Ser; and X 7 is any amino acid, preferably not a basic or bulky 
hydrophobic amino acid, more preferably X 7 is any amino acid except Arg, Lys, or Tyr, most preferably X 7 is Leu, 
Trp, Asp, Pro, or Gly. A CPD motif preferably comprises the consensus sequence -Leu/Gly/Tyr-Pro-pThr-Pro- (SEQ 

15 IDN0.21). 

A CPD motif containing protein includes proteins comprising the CPD motif including but not limited to 
Gcn4, Cyclin E, Farl, Ashl, Sicl, Pcl7, Cdcl6, p27 k,pl , Cln2, and transcription factors such as p catenin or IkP<x, and 
homologues of these proteins. The term includes but is not limited to all homologs, orthologs, naturally occurring 
allelic variants, isoforms and precursors of the polypeptides. Other proteins containing CPD motif sequences may be 

20 identified with a protein homology search, for example by searching available databases such as GenBank or 
SwissProt and various search algorithms and/or programs may be used including FASTA, BLAST (available as a part 
of the GCG sequence analysis package, University of Wisconsin, Madison, Wis.), or ENTREZ (National Center for 
Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD). 

The term "substrate" refers to a protein that interacts with an F-box protein targeting it for ubiquitin- 

25 dependent proteolysis, or a protein targeted for F-box dependent degradation. Examples of substrates are CPD motif 
containing proteins including Gcn4, CyclinE, Farl, Ashl, Sicl, Pcl7, Cdcl6, p27 k,pl ; Cln2, and, transcription factors 
such as p catenin or IkPoc. The term also refers to a part of a protein that interacts with an F-box protein, including a 
CPD motif, and analogues of substrates or parts thereof 

A "ligand" refers to a compound or entity that associates with a binding pocket, or modulators of an F-box 

30 protein or SCF E3 ubiquitin ligase, including inhibitors. A ligand may be designed rationally by using a model 
according to the present invention. 

The terms "cdc4 polypeptide" is used to refer to polypeptides of the cdc4 family of proteins characterized by 
an F-box motif and WD repeats. The term includes but is not limited to all homologs, orthologs, naturally occurring 
allelic variants, isoforms and precursors of the polypeptides of GenBank Accession Nos. S56245 or SEQ ID NO. 22 

35 (Saccharomyces cerevisiae cdc4), CAA65538 or SEQ ID NO. 23 (Candida albicans cdc4), AAL07271 or SEQ ID 
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NO. 24 (human cdc4), AAC47809 or SEQ ID NO. 25 (sei-10), AAK57547 or SEQ ID NO. 26 {Homo sapiens F-box 
protein FBW7), and AAG09623F or SEQ ID NO. 27 {Homo sapiens F box protein FBX30). In general, for example, 
naturally occurring allelic variants will share significant homology (70-90%) to these sequences. Allelic variants may 
contain conservative amino acid substitutions from cdc4 sequences or will contain a substitution of an amino acid 
from a corresponding position in a cdc4 homologue such as, for example, the human homologue. [See Strohmaier,H., 
Nature 413 (6853), 316-322 (2001) for a description and sequence of human cdc4]. The term also includes the mutant 
cdc4 polypeptides described herein. Figure 1 shows a structure based sequence alignment of cdc4 orthologs and 
paralogs. 

The term "cdc53" or "cdc53 polypeptide" is used interchangeably herein with the term "cullins" when 
referring to a vertebrate homolog of the yeast cdc53 protein. The term "cullin polypeptide" or "cullin protein", refers 
to a member of the cullins family, e.g., any one of cul-1, -2, -3, -4, -5, or -6. The term includes but is not limited to all 
homologs, naturally occurring allelic variants, isoforms and precursors of a cdc53 polypeptide or cullin of GenBank 
Accession Nos. AAB38821 or SEQ ID NO. 28 {Saccharomyces cerevisiae cdc53), AAC36304 or SEQ ID NO. 29 
{Homo sapiens cullin 3), AAC51 190 or SEQ ID NO. 30 {Homo sapiens cullin 2), NP_ 003581 or SEQ ID NO. 31 
(Homo sapiens cullin 3), AF126404J or SEQ ID NO. 32 (Homo sapiens cullin 2), CUL1_CAEEL or SEQ ID NO. 
33 {Caenorhabditis elegans cullin 1), AAA85085 or SEQ ID NO. 34 {Drosophila melanogaster cullin 1) and the 
cullins described in Kipreos ET (Cell 1996 Jun 14;85(6):829-39). In general for example, naturally occurring allelic 
variants of cdc53 will share significant homology (70-90%) to the cdc53 or cullin sequences. Allelic variants may 
contain conservative amino acid substitutions from the cdc4 sequence or will contain a substitution of an amino acid 
from a corresponding position in a cdc4 homolog such as, for example, the human homolog. 

The term "Skpl" or "Skpl polypeptide" is used to refer to polypeptides that connect cell cycle regulators to 
the ubiquitin proteolysis machinery by associating with F-box proteins through the F-box motif. The term includes 
but is not limited to all homologs, naturally occurring allelic variants, isoforms and precursors of Skpl of GenBank 
Accession Nos. SKPl_SCHPO or SEQ ID NO. 35 {Schizosaccharomyces pombe\ BAB62325 or SEQ ID NO. 36 
{Schizosaccharomyces pombe), AAC49492 or SEQ ID NO. 37 {Saccharomyces cerevisiae), and AAB17500 or SEQ 
ID NO. 38 {Saccharomyces cerevisiae). In general, for example, naturally occurring allelic variants of Skpl will share 
significant homology (70-90%) to the Skpl sequences. Allelic variants may contain conservative amino acid 
substitutions from the Skpl sequence or will contain a substitution of an amino acid from a corresponding position in 
a Skpl homolog such as, for example, the human homolog. Figure 1 shows a structure based sequence alignment of 
Skpl homologues. 

A CPD motif and WD repeat or proteins containing same, cdc4 polypeptides, cdc53, Skpl, substrates, and 
SCF complexes, may be from any species, particularly a mammalian species, including bovine, ovine, porcine, 
murine, equine, preferably the human species, and from any source, whether natural, synthetic, semi-synthetic, or 
recombinant. 
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The term "agonist" of a binding pocket refers to a compound or ligand that interacts with the binding pocket 
and maintains or increases the activity of the binding pocket to which it binds. The term includes partial agonists and 
inverse agonists. Agonists may include proteins, peptides, nucleic acids, carbohydrates, or any other molecules that 
bind to a binding pocket. Agonists also include a molecule derived from a binding pocket. Peptide mimetics, synthetic 
5 molecules with physical structures designed to mimic structural features of particular peptides, may serve as agonists. 
The stimulation may be direct, or indirect, or by a competitive or non-competitive mechanism. The term includes 
partial agonists and inverse agonists. 

As used herein, the term "partial agonist" means an agonist that is unable to evoke the maximal response of a 
biological system, even at a concentration sufficient to saturate the specific receptors. 
10 As used herein, the term "partial inverse agonist" is an inverse agonist that evokes a submaximal response to 

a biological system, even at a concentration sufficient to saturate the specific receptors. At high concentrations, it will 
diminish the actions of a full inverse agonist. 

The term "antagonist", as used herein, refers to a ligand or compound that binds a binding pocket but does 
not maintain the activity of the binding pocket to which it binds. The term can also includes a ligand that reduces the 
15 action of another agent, such as an agonist. An antagonistic action may result from a combination of the substance 
being antagonised (chemical antagonism) or the production of an opposite effect through a different protein 
(functional antagonism or physiological antagonism) or as a consequence of competition for the binding site of an 
intermediate that links a protein to the effect observed (indirect antagonism). The antagonist may act at the same site 
as the agonist (competitive antagonism). Antagonists may include proteins, peptides, nucleic acids, carbohydrates, or 
20 any other molecules that bind to a binding pocket. Antagonists also include a molecule derived from a binding pocket. 
Peptide mimetics, synthetic molecules with physical structures designed to mimic structural features of particular 
peptides, may serve 

As used herein, the term "competitive antagonism" refers to the competition between an agonist and an 
antagonist for a binding pocket of a protein that occurs when the binding of agonist and antagonist becomes mutually 

25 exclusive. This may be because the agonist and antagonist compete for the same binding sites or pockets, or combine 
with adjacent but overlapping sites. A third possibility is that different sites are involved but that they influence the 
receptor macromolecules in such a way that agonist and antagonist molecules cannot be bound at the same time. If 
the agonist and antagonist form only short lived combinations with a binding pocket so that equilibrium between 
agonist, antagonist and binding pocket is reached during the presence of the agonist, the antagonism will be 

30 surmountable over a wide range of concentrations. In contrast, some antagonists, when in close enough proximity to 
their binding site, may form a stable covalent bond with it and the antagonism becomes insurmountable when no spare 
receptors remain. 

By being "derived from" a binding pocket is meant any molecular entity which is identical or substantially 
equivalent to the binding pocket. A peptide derived from a binding pocket may encompass the amino acid sequence of 
35 a naturally occurring binding pocket, any portion of that binding pocket or other molecular entity that functions to 
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bind to an associated or interacting binding pocket. A peptide derived from such a binding pocket will interact d.rectly 
or indirectly with an associated molecule in such a way as to mimic the native binding pocket. Such peptides may 
include competitive inhibitors, peptide mimetics, and the like. The entity will not include a lull length sequence of a 
wild-type molecule. Peptide mimetics, synthetic molecules with physical structures designed to mmuc structural 

5 features of particular peptides, may serve as inhibitors or enhancers. 

"Peptide mimetics" are structures which serve as substitutes for peptides in interactions between molecules 
(See Morgan et al (1989), Ann. Reports Med. Chem. 24:243-252 for a review ). Peptide mimetics include synthetic 
structures which may or may not contain amino acids and/or peptide bonds but retain the structural and functional 
features of a peptide, or agonist or antagonist (i.e. enhancer or inhibitor) of a binding pocket. Peptide mimetics also 

10 include peptoids, oligopeptoids (Simon et al (1972) Proc. Natl. Acad, Sci USA 89:9367); and peptide libraries 
containing peptides of a designed length representing al. possible sequences of amino acids corresponding to a motif, 
peptide, or agonist or antagonist (i.e. enhancer or inhibitor) of the invention. 

' ' Sequences are "homologous" or considered "homologs" when at least about 70% (preferably at least about 

80 to 90% and most preferably at least 95%) of the nucleotides or amino acids match over a defined length of the 
, 5 molecule. "Substantially homologous" also includes sequences showing identity to the specified sequence. Percent 

identity can be determined electronically, e.g., by using the MEGALIGN program (DNASTAR, Inc., Madison Wis.) 

which can create alignments between two or more sequences according to different methods, e.g., the c.usta. method. 

(See e g Higgins D G. and P. M. Sharp (1988) Gene 73:237-244.) Percent identity can also be determmed by other 

me thods known in the art, (e.g., the Jotun Hein method. (See, e.g., Hein, J. (1990) Methods Enzymol. 183:626-645) or 
20 by varying hybridization conditions). Preferably, the amino acid or nucleic acid sequences have an alignment score of 

greater than 5 (in standard deviation units) using the program ALIGN with the mutation gap matrix and a gap penalty 

of 6 or greater (Dayhoff). 
BINDING POCKET 

"Binding pocket" refers to a region or site of an F-box protein or molecular complex thereof (e.g. Skpl-F- 
25 box complex, SCF E3 ubiquitin ligase) involved in substrate selection and/or orientation. As the result of its shape, a 
binding pocket associates with another region of an F-box protein or SCF complex or with a substrate or a part 

thereof. . . 

in an aspect of the invention a binding pocket comprises one or more of the residues mvolved m selection 

and/or orientation of a substrate or ligand. 

30 in an aspect of the invention a binding pocket is provided that comprises the WD40 repeat domam of an F- 

box protein. In another embodiment the binding pocket comprises a WD40 repeat domain and a helical linker of an F- 
box protein. In a further embodiment, the binding pocket comprises a WD40 repeat domain, a helical linker and an F- 
box domain of an F-box protein. In an embodiment the F-box protein is a cdc4 polypeptide or portion thereof. 

A binding pocket of the invention may comprise a WD40 repeat domain characterized by one or more of the 

35 following characteristics: 
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(a) a 7 or 8 blade P-propeller structure, in particular a 8 blade p-propeller structure; 

(b) a disk like structure characterized by a cavity in the middle and two opposing circular surfaces of 
different size; 

(c) a conical frustum of about 40A top surface and about 50A bottom surface, an overall thickness of 
30A and a central pore of 6A diameter; and 

(d) a CPD binding site on the top surface of the frustum of (c) and running across the edge of the pore, 
while the bottom surface of the frustum links to the F-box domain. 

A binding pocket of the invention may be characterized by one or more of the following characteristics: 

(i) a dedicated pThr-Pro binding pocket; 

(ii) a deep hydrophobic pocket that selects hydrophobic residues N-terminal to the phosphorylation site 
of a CPD, and 

(iii) a through space electrostatic selection against basic residues C-terminal to the phosphorylation site 
of a CPD. 

A binding pocket of the invention may comprise a helical linker characterized by a helices that form a stalk 
and pedestal like structure that connects and orients a WD repeat domain. The helical linker binding pocket can also 
be characterized by one or more of the following: 

(a) a helix (e.g. ct6 in Figure 2 or Figure 6) that is 30A in length and is anchored at its N-terminus to the 
hydrophobic core of the F-box/helical extension and at its C-terminus to the hydrophobic core of a 
WD repeat domain, 

(b) the helix of (a) (e.g. cc6) anchored at its amino terminus to an F-box through hydrophobic 
interactions (e.g. involving cc6 residues Phe 355, Leu356, and F box residues Ile295, Ile296, 
Leu315, Leu 319 and Trp316 of Cdc4 or the corresponding residues in Cdc4 homologs, variants, 
precursors etc.); 

(c) a second helix (e.g. helix 5) packed along the base of the helix of (a) or (b) opposite to the F-box 
domain through hydrophobic interactions (e.g. involving Tyr342, Leu338, and Leu 334 of Cdc4 or 
the corresponding residues in Cdc4 homologs, variants, precursors etc.); 

(d) the helix of (a) (e.g. helix a6) anchored at its C-terminus through hydrophobic interactions; 

(e) a C-terminal end of helix a6 inserted obliquely between propeller blades p7 and p8 of a WD40 
domain through van der Wals and hydrophobic interactions (e.g. involving Trp365 and Ile361 with 
WD40 domain residues Val687, Ile696, Leu726, and Phe743 in p-propeller blades 7 and 8 of Cdc4 
or the corresponding residues in Cdc4 homologs, variants, precursors etc.). 

A CPD motif binding pocket of the invention may comprise a hydrophobic pocket that surrounds the open 
central channel of a 7 or 8 blade WD repeat propeller. A binding pocket of Cdc4 is more particularly characterized by 
one or more of the following: 
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(a) a WD repeat domain surface composed of invariant and highly conserved residues from p-propeller 
blades; 

(b) a three-sided pocket formed by Trp426, Thr386, and Arg 485 (or the corresponding residues in 
Cdc4 homologs, variants, precursors etc.); 

(c) a three-sided pocket formed by Trp426, Thr441, Thr 465, and Arg 485 (or the corresponding 
residues in Cdc4 homologs, variants, precursors etc.); 

(d) a hydrophobic pocket composed of Tip 426, Trp 717, Thr 386, and Val 384 (or the corresponding 
residues in Cdc4 homologs, variants, precursors etc.); 

(e) a pocket formed by Leu634, Met590, and Tyr574 (or the corresponding residues in Cdc4 homologs, 
variants, precursors etc.); and 

(f) a pocket formed by Arg485, Arg467, Arg534, Tyr548, and Arg572 (or the corresponding residues in 

Cdc4 homologs, variants, precursors etc.);. 

A binding pocket may comprise one or more of the amino acid residues for an F-box protein crystal or F-box 
protein -substrate crystal identified in Table 3 or Table 4. In an aspect the binding pocket comprises the atomic 
contacts of atomic interactions 1 to 4 or interactions 5 to 8/9 identified in Table 3 or Table 4. In an aspect of the 
invention the binding pocket comprises all of the amino acid residues identified in Table 3 or Table 4. 

The term "binding pocket" (BP) also includes a homolog of the binding pocket or a portion thereof. As used 
herein the term "homolog" in reference to a binding pocket refers to a binding pocket or a portion thereof which may 
have deletions, insertions or substitutions of amino acid residues as long as the binding specificity is retained. In this 
regard deliberate amino acid substitutions may be made on the basis of similarity in polarity, charge, solubility, 
hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues as long as the binding specificity of the 
binding pocket is retained. 

As used herein, the term "portion thereof means the structural coordinates corresponding to a sufficient 
number of amino acid residues of a binding pocket (or homologs thereof) that are capable of associating with a 
substrate (e.g. CPD motif) or ligand. For example, the structural coordinates provided in a crystal structure may 
contain a subset of the amino acid residues in a binding pocket which may be useful in the modelling and design of 
compounds that bind to the binding pocket. 
CRYSTAL 

The invention provides crystal structures. As used herein, the term "crystal" or "crystalline" means a 
structure (such as a three dimensional (3D) solid aggregate) in which the plane faces intersect at definite angles and in 
which there is a regular structure (such as internal structure) of the constituent chemical species. The term "crystal- 
can include any one of: a solid physical crystal form such as an experimentally prepared crystal, a crystal structure 
derivable from the crystal (including secondary and/or tertiary and/or quaternary structural elements), a 2D and/or 3D 
model based on the crystal structure, a representation thereof such as a schematic representation thereof or a 
diagrammatic representation thereof, or a data set thereof for a computer. 
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in one aspect, the crystal is usable in X-ray crystallography techniques. Here, the crystals used can withstand 
exposure to X-ray beams used to produce a diffraction pattern data necessary to solve the X-ray crystallography 
structure. A crystal may be characterized as being capable of diffracting x-rays in a pattern defined by one of the 
crystal forms depicted in Blundel et al 1976, Protein Crystallography, Academic Press. 

A crystal of the invention is generally produced in a laboratory; that is, it is an isolated crystal produced by an 

individual . 

The invention contemplates a crystal comprising a binding pocket of the invention, in pabular a bmdmg 

pocket of an F-box protein or SCF complex or portion thereof, involved in substrate selection and/or orientate. 

In an aspect of the invention a crystal is provided that comprises the WD40 repeat domain of an F-box 
protein in particular Cdc4. In another embodiment the crystal comprises a WD40 repeat domain and a helical hnker 
of an F-box protein. In a further embodiment, the crystal comprises a WD40 repeat domain, a helical linker and an F- 
box domain of an F-box protein. In an embodiment the F-box protein is a cdc4 polypeptide or port.on thereof. 

A crystal of the invention comprising a WD40 repeat domain, in particular a Cdc4 polypeptide WD40 repeat 
domain, may be characterized by one or more of the following characteristics: 

(a) a 7 or 8 blade p-propeller structure, in particular a 8 blade p-propeller structure; 

(b) a disk like structure characterized by a cavity in the middle and two opposing circular surfaces of 
different size; 

(c) a conical frustum of about 40A top surface and about 50A bottom surface, an overall thickness of 
30A and a central pore of 6 A diameter; and 

(d) a CPD binding site on the top surface of the frustum of (c) and running across the edge of the pore, 
while the bottom surface of the frustum links to the F-box domain. 

Each blade of the p-propeller structure can be further characterized by 4 anti-parallel P-strands. The disk like 
structure can also be characterized by a smal.er surface comprising a CPD binding site and a bottom surface anchored 
by a helix (e.g. helix a6) of a helical extension of the F-box protein. As illustrated in Figures 2 and 3 the structure >s 
further characterized by P-propeller blade 2 consisting of 5p-strand S and a strand P 9' forming a parallel arrangement 
with strand p9. 

A crystal of a binding pocket of an F-box protein of the invention, in particular a Cdc4 polypept.de, may be 
characterized by one or more of the following characteristics: 
(j) a dedicated pThr-Pro binding pocket; 

(ii) a deep hydrophobic pocket that selects hydrophobic residues N-terminal to the phosphorylation site 
of a CPD motif, and 

(iii) a through space electrostatic selection against basic residues C-terminal to the phosphorylation site 
of a CPD motif. 

In a preferred embodiment, a crystal of a WD40 repeat domain has the structure illustrated in Figure 2 or 3. 
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A crystal of the invention ean comprise a helical linker characterized by a helices that form a stalk and 
pedesta. like structure that connects and orients a WD repeat domain. A helical linker structure of a Cdc4 po.ypept.de 
can also be characterized by one or mo re of the following structures:(a) a helix (e.g. «6 in Figure 2 or Figured 

30A in length and is anchored at its N-terminus to the hydrophobic core of the F-box/hehcal 
extension and at its C-terminus to the hydrophobic core of a WD repeat domain, 

(b) the helix of (a) (e.g. a6) anchored at its amino terminus to an F-box through hydrophobic 
interactions (e.g. involving a6 residues Phe 355, Leu356, and F box residues Ile295, Ile296, 
Leu315 and Trp316 or the corresponding residues in Cdc4 homologs, variants, precursors etc.)); 

(c) a second helix (e.g. helix 5) packed along the base of the helix of (a) or (b) opposite to the F-box 
domain through hydrophobic interactions (e.g. involving Tyr342, Leu338, and Leu 334) (or the 
corresponding residues in Cdc4 homologs, variants, precursors etc.) ; 

(d ) the helix of (a) (e.g. helix a6) anchored at its C-terminus through hydrophobic interactions; 

(e) a C-terminal end of helix a6 inserted obliquely between propeller blades P 7 and (38 of the WD40 
domain through van der Wals and hydrophobic interactions (e.g. involving Trp365 and Ile361 wdfa 
WD40 domain residues Val687, Ile696, Leu726, and Phe743 in p-propeller bnlades 7 and 8 (or the 
corresponding residues in Cdc4 homologs, variants, precursors etc.). 

In a preferred embodiment, a crystal of a helical linker has the structure illustrated in Figure 2. 

A crystal of the invention may comprise a CPD motif binding pocket that is characterized by a hydrophob.c 
pocket that surrounds the open centra, channel of a 7 or 8 blade WD repeat propeller. A crystal of a Cdc4 polypeptide 
may be more particularly characterized by one or more of the following: 

(a) a WD repeat domain surface composed of invariant and highly conserved residues from (3-propeller 

blades; 

(b) a three-sided pocket formed by Tr P 426, Thr386, and Arg 485 (or the corresponding res.dues ,n 
Cdc4 homologs, variants, precursors etc.); 

(c) a three-sided pocket formed by Trp426, Thr441, Thr 465, and Arg 485 (or the correspondmg 
residues in Cdc4 homologs, variants, precursors etc.); 

(d) a hydrophobic pocket composed of Trp 426, Trp 717, Thr 386, and Val 384(or the correspond.ng 
residues in Cdc4 homologs, variants, precursors etc.); 

(e) a pocket formed by Leu634, Met590, and Tyr574 (or the corresponding residues in Cdc4 homologs, 

variants, precursors etc.); and 

(f) a pocket formed by Arg485, Arg467, Arg534, Tyr548, and Arg572 (or the corresponding residues ,n 

Cdc4 homologs, variants, precursors etc.). 
In a preferred embodiment, a crystal of a CPD motif binding pocket has the structure illustrated in Figure 3, 

4, 6 or 7 
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In a further aspect of the invention a crystal is provided comprising an F-box domain comprising five a 
helices. In a preferred embodiment, a crystal of an F-box domain has the structure illustrated in Figure 2 or Figure 6. 
A crystal of the invention may comprise an F-box protein characterized by one or more of the followmg: 

(a) an F-box domain consisting of five a helices; 

(b) a WD 40 repeat domain characterized by 7 or 8 copies of a WD40 repeat motif forming a 7 or 8 
blade p-propeller structure; and 

(c) two a helices that together with two a helices of the F-box domain forming a stalk and pedestal like 
structure that connects and orients the WD40 domain. 

With reference to a crystal of the present invention, residues in a binding pocket may be defined by their 
spatial proximity to a substrate or ligand in the crystal structure. For example, a binding pocket may be defined by its 
proximity to a substrate molecule, or modulator. 

A crystal of the invention includes a binding pocket in association with one or more moieties, including 
heavy-metal atoms i.e. a derivative crystal, or one or more substrates or ligands i.e. a co-crystal. 

The term "associate", "association" or "associating" refers to a condition of proximity between a mo.ety (..e. 
chemical entity or compound or portions or fragments thereof), and a binding pocket. The association may be non- 
covalent i.e. where the juxtaposition is energetically favored by for example, hydrogen-bonding, van der Waals, or 
electrostatic or hydrophobic interactions, or it may be covalent. 

The term "heavy-metal atoms" refers to an atom that can be used to solve an x-ray crystallography phase 
problem including but not limited to a transition element, a lanthanide metal, or an actinide metal. Lanthanide metals 
include elements with atomic numbers between 57 and 71, inclusive. Actinide metals include elements w,th atom.c 

numbers between 89 and 103, inclusive. 

Multiwavelength anomalous diffraction (MAD) phasing may be used to solve protein structures us.ng 
selenomethiony. (SeMet) proteins. Therefore, a complex of the invention may comprise a crystalline binding pocket 
with selenium on the methionine residues of the protein. 

A crystal may comprise a complex between a binding pocket and one or more substrates or ligands. In other 
words the binding pocket may be associated with one or more ligands or molecules in the crystal. The ligand may be 
any compound that is capable of stably and specifically associating with the binding pocket. A ligand may, for 
example be a modulator or analogue thereof. Therefore, a crystal may comprise a binding pocket compns.ng two or 
m ore of the amino acid residues of an F-box protein structure as described herein, that are capable of assocat.ng w.th 
30 or coordinating a CPD motif as described herein. 

In an embodiment, a crystal of the invention comprises a complex between a binding pocket, and a substrate 
or analogue thereof. Therefore, the present invention also provides a crystal comprising a binding pocket of an F-box 
protein or a SCF complex and a substrate or analogue thereof. A substrate may be for example, a CPD motif or CPD 
motif containing protein. An analog of a substrate is one which mimics the substrate molecule, binding in the bindmg 
pocket but which is incapable (or has a significantly reduced capacity) to take part in SCF E3 ubiquitin ligase actmty. 
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in an embodiment, a crystal comprising a WD repeat domain of a Cdc4 polypeptide and a CPD motif is 
provided, which is characterized by one or more of the following: 

(a) a WD 40 repeat domain characterized by 7 or 8 copies of a WD40 repeat motif forming a 7 or 8 
blade p-propeller structure comprising p-propeller blades 1 , 2, 3, 4, 5, 6, and 7, and optionally 8; 

(b) the CPD motif binds in an extended manner across P-propeller blade 2 with the N-terminus oriented 
toward the central cavity of the WD repeat domain and the C-terminus oriented towards the outer 



rim; 



(c) the CPD binding surface of the WD repeat domain is composed of invariant and highly conserved 
residues from p-propeller blades 1 to 6 and optionally 8; 

(d) a P0 phosphate pThr of the CPD motif forms direct electrostatic interactions with the guanidium 
groups of Arg 485, Arg 467, and Arg 534 and a direct hydrogen bond with the side chain of Tyr 548 
(or the corresponding residues in Cdc4 homologs, variants, precursors etc.); 

(e) P +1 proline side chains of the CPD motif project into a three-sided pocket on the CPD binding 
surface formed by the side chain of Trp 426 and Arg485 or Trp 426, Thr441, Thr465, and Arg 485 
(or the corresponding residues in Cdc4 homologs, variants, precursors etc.); and 

(f) P + l leucine side chain of the CPD motif is oriented towards a hydrophobic pocket composed of 
residues Trp 426, Trp 717, Thr 386, and Val 384 (or the corresponding residues in Cdc4 homologs, 
variants, precursors etc.). 

In a preferred embodiment, a crystal of a complex of a WD repeat domain and a CPD motifhas the structure 
illustrated in Figure 2, 3, 4, 6, or 7. 

A crystal or secondary or three-dimensional structure of a binding pocket of an F-box prote.n, may be 
specifically defined by one or more of the atomic contacts of the atomic interactions identified in Table 3 or Table 4. 
The atomic interactions in Table 3 or Table 4 are defined therein by an atomic contact (more preferably, a specfic 
atom of an amino acid residue where indicated) on the F box protein, in particular on the WD40 repeat domain or 
helical linker, and an atomic contact (more preferably, a specific atom of an amino acid residue where indicated) on a 
substrate e.g. CPD motif, or an atomic contact (more preferably, a specific atom of an amino acid residue where 
indicated) on another region of the F-box protein (e.g. helical linker or F-box domain). In certain embodiments, a 
crystal of the invention comprises the atomic contacts of atomic interactions 1 to 8 identified in Table 3 or Table 4. In 
certain particular embodiments a crystal is provided comprising the atomic contacts of atomic interactions 1 to 4 or 5 
to 8. Preferably, a crystal is defined by the atoms of the atomic contacts in the binding pocket having the structural 

coordinates for the atoms listed in Table 6. 

A structure of a complex may be defined by selected intermolecular contacts, preferably the structural 
coordinates of the intermolecular contacts as defined in Table 6, preferably interactions 5 to 8. 

A crystal of the invention may comprise one or more of the following groups of amino acid res.dues: (a) He 
295 lie 296 Leu315, Trp 316, Leu 319, Phe 355, and Leu 356; (b) Va. 687, He 696, Leu 726, Phe 743, Trp 365, and 



23 

He 364; (c) Asn 684, Arg 700, and Glu 323; (d) Arg 485, Arg 467, Arg 534, Tyr 548; (e) Tip 426, Arg 485, Thr 441, 
and Thr 465; (f) Trp 426, Trp 717, Thr 386, and Val 384; (g) Tyr 574, Thr 386 and Val 384; (h) Tyr 574, Met 590, 
and Leo 634; and (i) the corresponding residues in Cdc4 homologs, paralogs, variants, or precursors. Preferably the 
atoms of the amino acid residues have the structural coordinates as set out in Table 6. 

5 A crystal of the invention may enable the determination of structural data for a substrate or ligand. In order to 

be able to derive structural data for a ligand, or substrate it is necessary for the molecule to have sufficiently strong 
electron density to enable a model of the molecule to be built using standard techniques. For example, there should be 
sufficient electron density to allow a model to be built using XTALVIEW (McRee 1992 J. Mol. Graphics. 10 44-46). 

A crystal of the invention may belong to space group P3 2 . The term "space group" refers to the lattice and 

10 symmetry of the crystal. In a space group designation the capital letter indicates the lattice type and the other symbols 
represent symmetry operations that can be carried out on the contents of the asymmetric unit without changing its 
appearance. 

A crystal of the invention may comprise a unit cell having the following unit dimensions: a =107.7A, b = 
107.7A, c = 168.3A, a = y = 90°, P =120°. The term "unit cell" refers to the smallest and simplest volume element (i.e. 

15 parallelpiped-shaped block) of a crystal that is completely representative of the unit of pattern of the crystal. The unit 
cell axial lengths are represented by a, b, and c. Those of skill in the art understand that a set of atomic coordinates 
determined by X-ray crystallography is not without standard error. 

In a preferred embodiment, a crystal of the invention has the structural coordinates as shown in Table 6. As 
used herein, the term "structural coordinates" refers to a set of values that define the position of one or more amino 

20 acid residues with reference to a system of axes. The term refers to a data set that defines the three dimensional 
structure of a molecule or molecules (e.g. Cartesian coordinates, temperature factors, and occupancies). Structural 
coordinates can be slightly modified and still render nearly identical three dimensional structures. A measure of a 
unique set of structural coordinates is the root-mean-square deviation of the resulting structure. Structural coordinates 
that render three dimensional structures (in particular a three dimensional structure of a ligand binding pocket) that 

25 deviate from one another by a root-mean-square deviation of less than 5 A, 4 A, 3 A, 2 A, 1.5 A. 1.0 A, or 0.5 A may 
be viewed by a person of ordinary skill in the art as very similar. 

Variations in structural coordinates may be generated because of mathematical manipulations of the 
structural coordinates of a structure or binding pocket described herein. For example, the structural coordinates of 
Table 6 may be manipulated by crystallographic permutations of the structural coordinates, fractionalization of the 

30 structural coordinates, integer additions or substractions to sets of the structural coordinates, inversion of the structural 
coordinates or any combination of the above. 

Variations in the crystal structure due to mutations, additions, substitutions, and/or deletions of the amino 
acids, or other changes in any of the components that make up the crystal may also account for modifications in 
structural coordinates. If such modifications are within an acceptable standard error as compared to the original 

35 structural coordinates, the resulting structure may be the same. Therefore, a ligand that bound to a binding pocket of 
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an F-box protein, would also be expected to bind to another binding pocket whose structural coordinates defined a 
shape that fell within the acceptable error. Such modified structures of a binding pocket thereof are also within the 
scope of the invention. 

Various computational analyses may be used to determine whether a molecule or the binding pocket thereof 
is sufficiently similar to all or parts of an F-box or a binding pocket thereof. Such analyses may be carried out using 
conventional software applications and methods as described herein. 

A crystal of the invention may also be specifically characterised by the parameters, diffraction statistics 
and/or refinement statistics set out in Table 1 or in Table 2. 

Illustrations of particular crystals of the invention are shown in Figures 2, 3, 4, 6 and 7. 
METHOD OF MAKING A CRYSTAL 

The present invention also provides a method of making a crystal according to the invention. The crystal may 
be formed from an aqueous solution comprising a purified polypeptide comprising an F-box protein including a 
variant, part, homolog, or fragment thereof (e.g. a binding pocket). A method may utilize a purified polypeptide 
comprising a binding pocket to form a crystal. A method may utilize one or more purified mutant polypeptides as 
described herein. In an embodiment, a mutant cdc4 polypeptide is used to make crystals. 

The term "purified" in reference to a polypeptide, does not require absolute purity such as a homogenous 
preparation rather it represents an indication that the polypeptide is relatively purer than in the natural environment. 
Generally, a purified polypeptide is substantially free of other proteins, lipids, carbohydrates, or other materials with 
which it is naturally associated, preferably at a functionally significant level for example at least 85% pure, more 
preferably at least 95% pure, most preferably at least 99% pure. A skilled artisan can purify a polypeptide comprising 
using standard techniques for protein purification. A substantially pure polypeptide will yield a single major band on a 
non-reducing polyacrylamide gel. Purity of the polypeptide can also be determined by amino-terminal amino acid 
sequence analysis. 

A polypeptide used in the method may be chemically synthesized in whole or in part using techniques that 
are well-known in the art. Alternatively, methods are well known to the skilled artisan to construct expression vectors 
containing a native or mutated protein coding sequence and appropriate transcriptional/translational control signals. 
These methods include in vitro recombinant DNA techniques, synthetic techniques, and in vivo recombination/genetic 
recombination. See for example the techniques described in Sambrook et al. (Molecular Cloning: A Laboratory 
Manual, 2nd Edition, Cold Spring Harbor Laboratory press (1989)), and other laboratory textbooks. (See also Sarker 
et al, Glycoconjugate J. 7:380, 1990; Sarker et al, Proc. Natl. Acad, Sci. USA 88:234-238, 1991, Sarker et al, 
Glycoconjugate J. 11: 204-209, 1994; Hull et al, Biochem Biophys Res Commun 176:608, 1991 and Pownall et al, 
Genomics 12:699-704, 1992). 

Crystals may be grown from an aqueous solution containing the purified polypeptide by a variety of 
conventional processes. These processes include batch, liquid, bridge, dialysis, vapor diffusion, and hanging drop 
methods. (See for example, McPherson, 1982 John Wiley, New York; McPherson, 1990, Eur. J. Biochem. 189: 1-23; 
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Webber. 1991, Adv. Protein Chem. 41:1-36). Generally, native crystals of the invention are grown by adding 
precipitants to the concentrated solution of the polypeptide. The precipitants are added at a concentration just below 
that necessary to precipitate the protein. Water is removed by controlled evaporation to produce precipitating 
conditions, which are maintained until crystal growth ceases. 

5 Derivative crystals of the invention can be obtained by soaking native crystals in a solution containing salts 

of heavy metal atoms. A complex of the invention can be obtained by soaking a native crystal in a solution containing 
a compound that binds the polypeptide, or they can be obtained by co-crystallizing the polypeptide in the presence of 
one or more compounds. In order to obtain co-crystals with a compound which binds deep within the tertiary 
structure of the polypeptide it is necessary to use the second method. 

10 Once the crystal is grown it can be placed in a glass capillary tube and mounted onto a holding device 

connected to an X-ray generator and an X-ray detection device. Collection of X-ray diffraction patterns are well 
documented by those skilled in the art (See for example, Ducruix and Geige, 1992, IRL Press, Oxford, England). A 
beam of X-rays enter the crystal and diffract from the crystal. An X-ray detection device can be utilized to record the 
diffraction patterns emanating from the crystal. Suitable devices include the Marr 345 imaging plate detector system 

1 5 with an RU200 rotating anode generator. 

Multiwavelength anomalous diffraction (MAD) phasing using selenomethionyl (SeMet) proteins may be 
used to determine a crystal of the invention. Thus, the invention contemplates a method for determining a crystal 
structure of the invention using a selenomethionyl derivative of an F-box protein or SCF complex, including a variant, 
part, homolog or fragement thereof. 

20 Methods for obtaining the three dimensional structure of the crystalline form of a molecule or complex are 

described herein and known to those skilled in the art (see Ducruix and Geige 1992, IRL Press, Oxford, England). 
Generally, the x-ray crystal structure is given by the diffraction patterns. Each diffraction pattern reflection is 
characterized as a vector and the data collected at this stage determines the amplitude of each vector. The phases of 
the vectors may be determined by the isomorphous replacement method where heavy atoms soaked into the crystal are 

25 used as reference points in the X-ray analysis (see for example, Otwinowski, 1991, Daresbury, United Kingdom, 80- 
86). The phases of the vectors may also be determined by molecular replacement (see for example, Naraza, 1994, 
Proteins 11:281-296). The amplitudes and phases of vectors from the crystalline form determined in accordance with 
these methods can be used to analyze other related crystalline polypeptides. 

The unit cell dimensions and symmetry, and vector amplitude and phase information can be used in a Fourier 

30 transform function to calculate the electron density in the unit cell i.e. to generate an experimental electron density 
map. This may be accomplished using the PHASES package (Furey, 1990). Amino acid sequence structures are fit to 
the experimental electron density map (i.e. model building) using computer programs (e.g. Jones, TA. et al, Acta 
Crystallogr A47, 100-1 19, 1991). This structure can also be used to calculate a theoretical electron density map. The 
theoretical and experimental electron density maps can be compared and the agreement between the maps can be 

35 described by a parameter referred to as R- factor. A high degree of overlap in the maps is represented by a low value 
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R-factor. The R-factor can be minimized by using computer programs that refine the structure to achieve agreement 
between the theoretical and observed electron density map. For example, the XPLOR program, developed by Brunger 
(1992, Nature 355:472-475) can be used for model refinement. 

A three dimensional structure of the molecule or complex may be described by atoms that fit the theoretical 
5 electron density characterized by a minimum R value. Files can be created for the structure that defines each atom by 
coordinates in three dimensions. 
MUTANT CDC4 POLYPEPTIDES 

The present invention provides novel mutant cdc4 polypeptides. 

A particular mutant of the present invention is a polypeptide having an amino acid sequence of a cdc4 
10 polypeptide wherein amino acid residues are replaced or deleted providing a cdc4 polypeptide that can be produced by 
recombinant techniques and retains its activity, for example its ability to associate with a CPD motif. 

In an aspect a cdc4 sequence is mutated by deleting the region from the beginning of the F-box domain to the 
end of the WD40 repeat domain. In particular, terminal residues 1 to 262 and 745 to 779 can be deleted from the cdc4 
seqeunce. 

15 Other additions, substitutions, and/or deletions may be made to the cdc4 mutants of the present invention. In an 

embodiment cdc4 can be engineered to remove flexible loops comprising residues 601 to 604 and 609 to 624. 
Particular mutant cdc4 polypeptides of the invention are also identified in Table 5. 

The present invention also relates to nucleic acid molecules or polynucleotides encoding a cdc4 mutant 
polypeptide. The polynucleotides can be used to transform host cells to express the cdc4 mutant polypeptides of the 
20 invention. They can also be used as a probe to detect related enzymes. 

The present invention still further relates to recombinant vectors that include the nucleic acid molecules of 
the invention. The nucleic acid molecules of the invention may be inserted into an appropriate vector, and the vector 
may contain the necessary elements for the transcription and translation of an inserted coding sequence. Accordingly, 
vectors may be constructed which comprise a nucleic acid molecule of the invention, and where appropriate one or 
25 more transcription and translation elements linked to the nucleic acid molecule. A vector can be used to transform host 
cells. Therefore, the invention provides host cells containing a vector of the invention. As well, the invention 
provides methods of making such vectors and host cells. 

The mutant cdc4 polypeptides of the invention can be encoded, expressed, and purified by any one of a number 
of methods known to those skilled in the art. Preferred production methods will depend on many factors including the 
30 costs and availability of materials and other economic considerations. The optimum production procedure for a given 
situation will be apparent to those skilled in the art through minimal experimentation. 

In accordance with an aspect of the present invention, there is provided a process for producing a cdc4 
mutant polypeptide by recombinant techniques utilizing the nucleic acid molecules of the invention. The method may 
comprise culturing recombinant host cells containing a nucleic acid sequence encoding a cdc4 mutant polypeptide, 
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under conditions promoting expression of the cdc4 mutant polypeptide, and subsequent recovery of the cdc4 mutant 
polypeptide. 

The invention further broadly contemplates a recombinant cdc4 mutant polypeptide obtained using a method 
of the invention. 

5 A cdc4 mutant polypeptide of the invention may be conjugated with other molecules, such as polypeptides, 

to prepare fusion polypeptides or chimeric polypeptides. This may be accomplished, for example, by the synthesis of 
N-terminal or C-terminal fusion polypeptides. 

The invention further contemplates antibodies having specificity against a cdc4 mutant polypeptide of the 
invention. Antibodies may be labeled with a detectable substance and used to detect cdc4 mutant polypeptides. In 
10 another embodiment, the invention provides an isolated antibody that binds specifically to a cdc4 mutant polypeptide. 

The cdc4 mutant polypeptides of the present invention are particularly well suited for use in screening methods 
for identifying modulators of cdc4 or SCF complexes. 

Still further the invention provides a method for evaluating a test compound for its ability to modulate the 
biological activity of a cdc4 polypepide. In this application, "modulate" refers to a change or an alteration in the 
15 biological activity of a cdc4 polypeptide. Modulation may be an increase or a decrease in activity, a change in 
characteristics (e.g. kinetic characteristics), or any other change in the biological, functional, or immunological 
properties of the polypeptide. 

The substances and compounds identified using the methods of the invention, may be used to modulate the 
biological activity of a cdc4 polypeptide or a SCF complex, and they may be used in the treatment of conditions 
20 mediated by a cdc4 polypeptide or SCF complex. Accordingly, the substances and compounds may be formulated into 
compositions for administration to individuals suffering from one or more of these conditions. Therefore, the present 
invention also relates to a composition comprising one or more of a substance or compound identified using a method 
of the invention, and a pharmaceutically acceptable carrier, excipient or diluent. A method for treating or preventing 
these conditions is also provided comprising administering to a patient in need thereof, a composition of the invention. 
25 MODEL 

A crystal structure of the present invention may be used to make a model of a binding pocket of a SCF E3 
ubiquitin ligase, in particular an F-box protein, that is involved in substrate selection and/or orientation, A model may, 
for example, be a structural model or a computer model. A model may represent the secondary, tertiary and/or 
quaternary structure of the binding pocket. The model itself may be in two or three dimensions. It is possible for a 

30 computer model to be in three dimensions despite the constraints imposed by a conventional computer screen, if it is 
possible to scroll along at least a pair of axes, causing "rotation" of the image. 

As used herein, the term "modelling" includes the quantitative and qualitative analysis of molecular structure 
and/or function based on atomic structural information and interaction models. The term "modelling" includes 
conventional numeric-based molecular dynamic and energy minimization models, interactive computer graphic 

35 models, modified molecular mechanics models, distance geometry and other structure-based constraint models. 
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Preferably, modelling is performed using a computer and may be further optimized using known methods. 
This is called modelling optimisation. 

An integral step to an approach of the invention for designing modulators (e.g. inhibitors) of a subject F-box 
protein or SCF complex involves construction of computer graphics models of a binding pocket of the invention 
5 which can be used to design pharmacophores by rational drug design. For instance, for an inhibitor to interact 
optimally with the subject binding pocket, it will generally be desirable that it have a shape which is at least partly 
complimentary to that of a particular binding pocket of the protein, as for example those binding pockets of the 
protein which are involved in recognition of a ligand (e.g. substrate). Additionally, other factors, including 
electrostatic interactions, hydrogen bonding, hydrophobic interactions, desolvation effects, and cooperative motions of 

10 ligand and receptor, all influence the binding effect and should be taken into account in attempts to design bioactive 
modulators (e.g. inhibitors). 

As described herein, a computer-generated molecular model of the subject binding pockets can be created. 
In preferred embodiments, at least the Cot-carbon positions of the binding pockets are mapped to a particular 
coordinate pattern, such as the coordinates for a binding pocket in Table 6, by homology modeling, and the structure 

15 of the protein and velocities of each atom are calculated at a simulation temperature (T 0 ) at which the docking 
simulation is to be determined. Typically, such a protocol involves primarily the prediction of side-chain 
conformations in the modeled binding pocket, while assuming a main-chain trace taken from a tertiary structure such 
as provided in Table 6 and the Figures. Computer programs for performing energy minimization routines are 
commonly used to generate molecular models. For example, both the CHARMM (Brooks et al (1983) J Comput 

20 Chem 4:187-217) and AMBER (Weiner et al (1981) 1 Comput Chem. 106: 765) algorithms handle all of the 
molecular system setup, force field calculation, and analysis (see also, Eisenfield et al. (1991) Am J Physiol 261:C376- 
386; Lybrand (1991) J P harm Belg 46:49-54; Froimowitz (1990) Biotechniques 8:640-644; Burbam et al. (1990) 
Proteins 7:99-1 11; Pedersen (1985) Environ Health Perspect 61:185-190; and Kini et al. (1991) J Biomol Struct Dyn 
9:475-488). At the heart of these programs is a set of subroutines that, given the position of every atom in the model, 

25 calculate the total potential energy of the system and the force on each atom. These programs may utilize a starting 
set of atomic coordinates, such as the coordinates provided in Table 6, the parameters for the various terms of the 
potential energy function, and a description of the molecular topology (the covalent structure). Common features of 
such molecular modeling methods include: provisions for handling hydrogen bonds and other constraint forces; the 
use of periodic boundary conditions; and provisions for occasionally adjusting positions, velocities, or other 

30 parameters in order to maintain or change temperature, pressure, volume, forces of constraint, or other externally 
controlled conditions. 

Most conventional energy minimization methods use the input data described above and the fact that the 
potential energy function is an explicit, differentiable function of Cartesian coordinates, to calculate the potential 
energy and its gradient (which gives the force on each atom) for any set of atomic positions. This information can be 
35 used to generate a new set of coordinates in an effort to reduce the total potential energy and, by repeating this process 
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over and over, to optimize the molecular structure under a given set of external conditions. These energy 
minimization methods are routinely applied to molecules similar to the subject proteins as well as nucleic acids, 
polymers and zeolites. 

In general, energy minimization methods can be carried out for a given temperature, Tj, which may be 

5 different than the docking simulation temperature, T 0 . Upon energy minimization of the molecule at T is coordinates 
and velocities of all the atoms in the system are computed. Additionally, the normal modes of the system are 
calculated. It will be appreciated by those skilled in the art that each normal mode is a collective, periodic motion, 
with all parts of the system moving in phase with each other, and that the motion of the molecule is the superposition 
of all normal modes. For a given temperature, the mean square amplitude of motion in a particular mode is inversely 

10 proportional to the effective force constant for that mode, so that the motion of the molecule will often be dominated 
by the low frequency vibrations. 

After the molecular model has been energy minimized at T i} the system is "heated" or "cooled" to the 
simulation temperature, T 0 , by carrying out an equilibration run where the velocities of the atoms are scaled in a step- 
wise manner until the desired temperature, T 0 , is reached. The system is further equilibrated for a specified period of 

15 time until certain properties of the system, such as average kinetic energy, remain constant. The coordinates and 
velocities of each atom are then obtained from the equilibrated system. 

Further energy minimization routines can also be carried out. For example, a second class of methods 
involves calculating approximate solutions to the constrained EOM for the protein. These methods use an iterative 
approach to solve for the Lagrange multipliers and, typically, only need a few iterations if the corrections required are 

20 small. The most popular method of this type, SHAKE (Ryckaert et al. (1977) J Comput Phys 23:327; and Van 
Gunsteren et ai. (1977) Mol Phys 34:1311) is easy to implement and scales as 0(N) as the number of constraints 
increases. Therefore, the method is applicable to macromolecules such as F-box proteins. An alternative method, 
RATTLE (Anderson (1983) J Comput Phys 52:24) is based on the velocity version of the Verlet algorithm. Like 
SHAKE, RATTLE is an iterative algorithm and can be used to energy minimize the model of the subject protein. 

25 Overlays and super positioning with a three dimensional model of a binding pocket of the invention may be 

used for modelling optimisation. Additionally alignment and/or modelling can be used as a guide for the placement of 
mutations on a binding pocket to characterize the nature of the site in the context of a cell. 

The three dimensional structure of a new crystal may be modelled using molecular replacement. The term 
"molecular replacement" refers to a method that involves generating a preliminary model of a molecule or complex 

30 whose structural coordinates are unknown, by orienting and positioning a molecule whose structural coordinates are 
known within the unit cell of the unknown crystal, so as best to account for the observed diffraction pattern of the 
unknown crystal. Phases can then be calculated from this model and combined with the observed amplitudes to give 
an approximate Fourier synthesis of the structure whose coordinates are unknown. This, in turn, can be subject to any 
of the several forms of refinement to provide a final, accurate structure of the unknown crystal. Lattman, E., "Use of 
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the Rotation and Translation Functions", in Methods in Enzymology, 115, pp. 55-77 (1985); M. G. Rossmann, ed., 
"The Molecular Replacement Method", Int. Sci. Rev. Ser., No. 13, Gordon & Breach, New York, (1972). 

Commonly used computer software packages for molecular replacement are X-PLOR (Brunger 1992, Nature 
355: 472-475), AMoRE (Navaza, 1994, Acta Crystallogr. A50: 157-1 63), the CCP4 package (Collaborative 
5 Computational Project, Number 4, "The CCP4 Suite: Programs for Protein Crystallography", Acta Cryst., Vol. D50, 
pp. 760-763, 1994), the MERLOT package (P.M.D. Fitzgerald, J. Appl. Cryst., Vol. 21, pp. 273-278, 1988) and 
XTALVIEW (McCree et al (1992) J. Mol. Graphics 10: 44-46. It is preferable that the resulting structure not exhibit a 
root-mean-square deviation of more than 3 A. 

Molecular replacement computer programs generally involve the following steps: (1) determining the 

10 number of molecules in the unit cell and defining the angles between them (self rotation function); (2) rotating the 
known structure against diffraction data to define the orientation of the molecules in the unit cell (rotation function); 
(3) translating the known structure in three dimensions to correctly position the molecules in the unit cell (translation 
function); (4) determining the phases of the X-ray diffraction data and calculating an R-factor calculated from the 
reference data set and from the new data wherein an R-factor between 30-50% indicates that the orientations of the 

15 atoms in the unit cell have been reasonably determined by the method; and (5) optionally, decreasing the R-factor to 
about 20% by refining the new electron density map using iterative refinement techniques known to those skilled in 
the art (refinement). 

The quality of the model may be analysed using a program such as PROCHECK or 3D-Profiler [Laskowski 
et al 1993 J. Appl. Cryst. 26:283-291; Luthy R, et al, Nature 356: 83-85, 1992; and Bowie, J.U. et al, Science 253: 
20 164-170, 1991]. Once any irregularities have been resolved, the entire structure may be further refined. 

Other molecular modelling techniques may also be employed in accordance with this invention. See, e.g., 
Cohen, N. C. et al, "Molecular Modelling Software and Methods for Medicinal Chemistry", J. Med. Chem., 33, pp. 
883-894 (1990). See also, Navia, M. A. and M. A. Murcko, "The Use of Structural Information in Drug Design", 
Current Opinions in Structural Biology, 2, pp. 202-210 (1992). 
25 Using the structural coordinates of crystal provided by the invention, molecular modelling may be used to 

determine the structural coordinates of a crystalline mutant or homolog of a SCF complex or F-box binding pocket 
involved in substrate selection and/or orientation. By the same token a crystal of the invention can be used to provide 
a model of a substrate or ligand. Modelling techniques can then be used to approximate the three dimensional 
structure of substrate or ligand derivatives and other components which may be able to mimic the atomic contacts 
30 between a substrate or ligand and binding pocket. 

COMPUTER FORMAT OF CRYSTALS/MODELS 

Information derivable from a crystal of the present invention (for example the structural coordinates) and/or 
the model of the present invention may be provided in a computer-readable format. 

Therefore, the invention provides a computer readable medium or a machine readable storage medium which 
35 comprises the structural coordinates of a binding pocket of an SCF complex of F box protein described herein 
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including all or any parts thereof, or substrates or ligands including portions thereof. Such storage medium or storage 
medium encoded with these data are capable of displaying on a computer screen or similar viewing device, a three- 
dimensional graphical representation of a molecule or molecular complex which comprises such binding pockets or 
similarly shaped homologous binding pockets. Thus, the invention also provides computerized representations of the 
secondary or three-dimensional structures of a binding pocket of the invention, including any electronic, magnetic, or 
electromagnetic storage forms of the data needed to define the structures such that the data will be computer readable 
for purposes of display and/or manipulation. 

In an aspect the invention provides a computer for producing a three-dimensional representation of a 
molecule or molecular complex, wherein said molecule or molecular complex comprises a binding pocket defined by 
structural coordinates of a binding pocket or structural coordinates of atoms of a substrate or ligand, or a three- 
dimensional representation of a homolog of said molecule or molecular complex, wherein said homolog comprises a 
binding pocket, or substrate or ligand that has a root mean square deviation from the backbone atoms not more than 
1.5 angstroms wherein said computer comprises: 

(a) a machine-readable data storage medium comprising a data storage material encoded with machine 
readable data wherein said data comprises the structural coordinates of a binding pocket or a 
substrate according to Table 6; 

(b) a working memory for storing instructions for processing said machine-readable data; 

(c) a central-processing unit coupled to said working memory and to said machine-readable data storage 
medium for processing said machine readable data into said three-dimensional representation; and 

(d) a display coupled to said central-processing unit for displaying said three-dimensional 
representation. 

The invention also provides a computer for determining at least a portion of the structural coordinates 
corresponding to an X-ray diffraction pattern of a molecule or molecular complex wherein said computer comprises: 

(a) a machine-readable data storage medium comprising a data storage material encoded with machine 
readable data wherein said data comprises the structural coordinates according to Table 6; 

(b) a machine-readable data storage medium comprising a data storage material encoded with machine 
readable data wherein said data comprises an X-ray diffraction pattern of said molecule or molecular 
complex; 

(c) a working memory for storing instructions for processing said machine-readable data of (a) and (b); 

(d) a central-processing unit coupled to said working memory and to said machine-readable data storage 
medium of (a) and (b) for performing a Fourier transform of the machine readable data of (a) and for 
processing said machine readable data of (b) into structural coordinates; and 

(e) a display coupled to said central-processing unit for displaying said structural coordinates of said 
molecule or molecular complex. 
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STRUCTURAL STUDIES 

The present invention also provides a method for determining the secondary and/or tertiary structures of a 
polypeptide or part or complexes thereof by using a crystal, or a model according to the present invention. The 
polypeptide or part thereof may be any polypeptide or part thereof for which the secondary and or tertiary structure is 
5 uncharacterised or incompletely characterised. In a preferred embodiment the polypeptide shares (or is predicted to 
share) some structural or functional homology to a crystal of the present invention. For example, the polypeptide may 
show a degree of structural homology over some or all parts of the primary amino acid sequence. 

The polypeptide may be an F-box protein, or part thereof with a different specificity for a substrate. 
Alternatively (or in addition) the polypeptide may be an F-box protein from a different species. 
10 The polypeptide may be a mutant of a wild-type F-box protein. A mutant may arise naturally, or may be 

made artificially (for example using molecular biology techniques). The mutant may also not be "made" at all in the 
conventional sense, but merely tested theoretically using the model of the present invention. A mutant may or may not 
be functional. 

Thus, using a model of the present invention, the effect of a particular mutation on the overall two and/or 
15 three dimensional structure of an F-box protein or SCF complex or the interaction between a binding pocket of an F- 
box protein or SCF complex and a substrate or ligand can be investigated. 

Alternatively, the polypeptide may perform an analogous function or be suspected to show a similar 
mechanism to an F-box protein. 

The polypeptide may also be the same as the polypeptide of the crystal, but in association with a different 
20 substrate or ligand (for example, modulator or inhibitor) or cofactor. In this way it is possible to investigate the effect 
of altering the substrate or ligand with which the polypeptide is associated on the structure of the binding pocket. 

Secondary or tertiary structure may be determined by applying the structural coordinates of a crystal or 
model of the present invention to other data such as an amino acid sequence, X-ray crystal lographic diffraction data, 
or nuclear magnetic resonance (NMR) data. Homology modeling, molecular replacement, and nuclear magnetic 
25 resonance methods using these other data sets are described below. 

Homology modeling (also known as comparative modeling or knowledge-based modeling) methods develop 
a three dimensional model from a polypeptide sequence based on the structures of known proteins (i.e. an F-box 
structure or complex thereof described herein). The method utilizes a computer model of a crystal of the present 
invention (the "known structure"), a computer representation of the amino acid sequence of the polypeptide with an 
30 unknown structure, and standard computer representations of the structures of amino acids. The method in particular 
comprises the steps of; (a) identifying structurally conserved and variable regions in the known structure; (b) aligning 
the amino acid sequences of the known structure and unknown structure (c) generating co-ordinates of main chain 
atoms and side chain atoms in structurally conserved and variable regions of the unknown structure based on the 
coordinates of the known structure thereby obtaining a homology model; and (d) refining the homology model to 
35 obtain a three dimensional structure for the unknown structure. This method is well known to those skilled in the art 
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(Greer, 1985, Science 228, 1055; Bundell et al 1988, Eur. J. Biochem. 172, 513; Knighton et ah, 1992, Science 
258:130-135, http://biochem. vt.edu/courses/modeling/ homology.htn). Computer programs that can be used in 
homology modelling are Quanta and the Homology module in the Insight II modelling package distributed by 
Molecular Simulations Inc, or MODELLER (Rockefeller University, www.iucr.ac.uk/sinris-top/logical/prg- 
5 modeller.html). 

In step (a) of the homology modelling method, a known structure is examined to identify the structurally 
conserved regions (SCRs) from which an average structure, or framework, can be constructed for these regions of the 
protein. Variable regions (VRs), in which known structures may differ in conformation, also must be identified. SCRs 
generally correspond to the elements of secondary structure, such as alpha-helices and beta-sheets, and to ligand- and 
10 substrate-binding sites. The VRs usually lie on the surface of the proteins and form the loops where the main chain 
turns. 

Many methods are available for sequence alignment of known structures and unknown structures. Sequence 
alignments generally are based on the dynamic programming algorithm of Needleman and Wunsch [J. Mol. Biol. 48: 
442-453, 1970]. Current methods include FASTA, Smith- Waterman, and BLASTP, with the BLASTP method 

15 differing from the other two in not allowing gaps. Scoring of alignments typically involves construction of a 20x20 
matrix in which identical amino acids and those of similar character (i.e., conservative substitutions) may be scored 
higher than those of different character. Substitution schemes which may be used to score alignments include the 
scoring matrices PAM (Dayhoff et al., Meth. Enzymol. 91: 524-545, 1983), and BLOSUM (Henikoff and Henikoff, 
Proc. Nat. Acad. Sci. USA 89: 10915-^0919, 1992), and the matrices based on alignments derived from three- 

20 dimensional structures including that of Johnson and Overington (JO matrices) (J. Mol. Biol. 233: 716-738, 1993). 

Alignment based solely on sequence may be used; however, other structural features also may be taken into 
account. In Quanta, multiple sequence alignment algorithms are available that may be used when aligning a sequence 
of the unknown with the known structures. Four scoring systems (i.e. sequence homology, secondary structure 
homology, residue accessibility homology, CA-CA distance homology) are available, each of which may be evaluated 

25 during an alignment so that relative statistical weights may be assigned. 

When generating coordinates for the unknown structure, main chain atoms and side chain atoms, both in 
SCRs and VRs need to be modelled. A variety of approaches known to those skilled in the art may be used to assign 
co-ordinates to the unknown. In particular, the co-ordinates of the main chain atoms of SCRs will be transferred to the 
unknown structure. VRs correspond most often to the loops on the surface of the polypeptide and if a loop in the 

30 known structure is a good model for the unknown, then the main chain co-ordinates of the known structure may be 
copied. Side chain coordinates of SCRs and VRs are copied if the residue type in the unknown is identical to or very 
similar to that in the known structure. For other side chain coordinates, a side chain rotamer library may be used to 
define the side chain coordinates. When a good model for a loop cannot be found fragment databases may be searched 
for loops in other proteins that may provide a suitable model for the unknown. If desired, the loop may then be 

35 subjected to conformational searching to identify low energy conformers if desired. 
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Once a homology model has been generated it is analyzed to determine its correctness. A computer program 
available to assist in this analysis is the Protein Health module in Quanta which provides a variety of tests. Other 
programs that provide structure analysis along with output include PROCHECK and 3D-Profiler [Luthy R. et al, 
Nature 356: 83-85, 1992; and Bowie, J.U. et al, Science 253: 164-170, 1991]. Once any irregularities have been 
5 resolved, the entire structure may be further refined. Refinement may consist of energy minimization with restraints, 
especially for the SCRs. Restraints may be gradually removed for subsequent minimizations. Molecular dynamics 
may also be applied in conjunction with energy minimization. 

Molecular replacement involves applying a known structure to solve the X-ray crystallographic data set of a 
polypeptide of unknown structure. The method can be used to define the phases describing the X-ray diffraction data 

10 of a polypeptide of unknown structure when only the amplitudes are known. Thus in an embodiment of the 
invention, a method is provided for determining three dimensional structures of polypeptides with unknown structure 
by applying the structural coordinates of a crystal of the present invention to provide an X-ray crystallographic data 
set for a polypeptide of unknown structure, and (b) determining a low energy conformation of the resulting structure. 

The structural coordinates of a crystal of the present invention may be applied to nuclear magnetic resonance 

15 (NMR) data to determine the three dimensional structures of polypeptides with uncharacterised or incompletely 
characterised sturcture. (See for example, Wuthrich, 1986, John Wiley and Sons, New York: 176-199; Pflugrath et al., 
1986, J. Molecular Biology 189: 383-386; Kline et al., 1986 J. Molecular Biology 189:377-382). While the secondary 
structure of a polypeptide may often be determined by NMR data, the spatial connections between individual pieces of 
secondary structure are not as readily determined. The structural coordinates of a polypeptide defined by X-ray 

20 crystallography can guide the NMR spectroscopist to an understanding of the spatial interactions between secondary 
structural elements in a polypeptide of related structure. Information on spatial interactions between secondary 
structural elements can greatly simplify Nuclear Overhauser Effect (NOE) data from two-dimensional NMR 
experiments. In addition, applying the structural coordinates after the determination of secondary structure by NMR 
techniques simplifies the assignment of NOE's relating to particular amino acids in the polypeptide sequence and does 

25 not greatly bias the NMR analysis of polypeptide structure. 

In an embodiment, the invention relates to a method of determining three dimensional structures of 
polypeptides with unknown structures, by applying the structural coordinates of a crystal of the present invention to 
nuclear magnetic resonance (NMR) data of the unknown structure. This method comprises the steps of: (a) 
determining the secondary structure of an unknown structure using NMR data; and (b) simplifying the assignment of 

30 through-space interactions of amino acids. The term " through-space interactions" defines the orientation of the 
secondary structural elements in the three dimensional structure and the distances between amino acids from different 
portions of the amino acid sequence. The term "assignment" defines a method of analyzing NMR data and identifying 
which amino acids give rise to signals in the NMR spectrum. 
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SCREENING METHODS 

Another aspect of the present invention is the design and identification of agents that inhibit or potentiate an 
interaction between an F-box protein or an SCF E3 ubiquitin ligase and a substrate. The rationale design and 
identification of agents can be accomplished by utilizing the structural coordinates that define a binding pocket of the 
present invention involved in substrate selection and/or orientation. 

The structures described herein, and the structures of other polypeptides determined by homology modeling, 
molecular replacement, and NMR techniques described herein can also be applied to modulator design and 
identification methods. 

The invention contemplates molecular models, in particular three-dimensional molecular models of binding 
pockets of the present invention involved in substrate selection and/or orientation, and their use as templates for the 
design of agents able to mimic or inhibit substrate binding (e.g. modulators). 

In certain embodiments, the present invention provides a method of screening for a ligand that associates 
with a binding pocket and/or modulates the function of a F-box protein or SCF complex by using a crystal or a model 
according to the present invention. The method may involve investigating whether a test compound is capable of 
associating with or binding a binding pocket, and/or inhibiting or enhancing interactions of atomic contacts in a 
binding pocket. 

In accordance with an aspect of the present invention, a method is provided for screening for a ligand capable 
of binding to a binding pocket, wherein the method comprises using a crystal or model according to the invention. 

In another aspect, the invention relates to a method of screening for a ligand capable of binding to a binding 
pocket, wherein the binding pocket is defined by the structural coordinates given herein, the method comprising 
contacting the binding pocket with a test compound and determining if the test compound binds to the binding pocket. 

In one embodiment, the present invention provides a method of screening for a test compound capable of 
interacting with one or more key amino acid residues of a binding pocket of the present invention. For example, a test 
compound that interacts with one or more of amino acids of a binding pocket may prevent interaction of the F-box 
protein or complex thereof and its substrate resulting in modification of the SCF E3 ubiquitin ligase activity. 

Another aspect of the invention provides a process comprising the steps of: 

(a) performing a method of screening for a ligand described above; 

(b) identifying one or more ligands capable of binding to a binding pocket; and 

(c) preparing a quantity of said one or more ligands. 

A further aspect of the invention provides a process comprising the steps of; 

(a) performing a method of screening for a ligand as described above; 

(b) identifying one or more ligands capable of binding to a binding pocket; and 

(c) preparing a pharmaceutical composition comprising said one or more ligands. 

Once a test compound capable of interacting with one or more key amino acid residues in a binding pocket of 
the present invention has been identified, further steps may be carried out either to select and/or modify compounds 
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and/or to modify existing compounds, to modulate the interaction with the key amino acid residues in the binding 
pocket. 

Yet another aspect of the invention provides a process comprising the steps of; 
(a) performing the method of screening for a ligand as described above; 
5 (b) identifying one or more ligands capable of binding to a binding pocket; 

(c) modifying said one or more ligands capable of binding to a binding pocket; 

(d) performing said method of screening for a ligand as described above; and 

(e) optionally preparing a pharmaceutical composition comprising said one or more ligands. 

In another aspect of the invention, a method of screening for a test compound is provided comprising 

10 screening for test compounds that affect (inhibit or potentiate) an interaction between an F-box protein or SCF 
complex and a substrate as defined by interactions 1 to 4 or 5 to 8/9 in Table 3 or Table 4. 

As used herein, the term "test compound" means any compound which is potentially capable of associating 
with a binding pocket, inhibiting or enhancing interactions of atomic contacts in a binding pocket. If, after testing, it 
is determined that the test compound does bind to the binding pocket, inhibits or enhances interactions of atomic 

15 contacts in a binding pocket, it is known as a "ligand". 

The test compound may be designed or obtained from a library of compounds which may comprise peptides, 
as well as other compounds, such as small organic molecules and particularly new lead compounds. By way of 
example, the test compound may be a natural substance, a biological macromolecule, or an extract made from 
biological materials such as bacteria, fungi, or animal (particularly mammalian) cells or tissues, an organic or an 

20 inorganic molecule, a synthetic test compound, a semi-synthetic test compound, a carbohydrate, a monosaccharide, an 
oligosaccharide or polysaccharide, a glycolipid, a glycopeptide, a saponin, a heterocyclic compound, a structural or 
functional mimetic, a peptide, a peptidomimetic, a derivatised test compound, a peptide cleaved from a whole protein, 
or a peptide synthesised synthetically (such as, by way of example, either using a peptide synthesizer or by 
recombinant techniques or combinations thereof), a recombinant test compound, a natural or a non-natural test 

25 compound, a fusion protein or equivalent thereof and mutants, derivatives or combinations thereof. 

The increasing availability of biomacromolecule structures of potential pharmacophoric molecules that have 
been solved crystallographically has prompted the development of a variety of direct computational methods for 
molecular design, in which the steric and electronic properties of substrate binding sites are use to guide the design of 
potential ligands (Cohen et al. (1990) J. Med. Cam. 33: 883-894; Kuntz et al. (1982) J. Mol Biol 161: 269-288; 

30 DesJarlais (1988) J. Med. Cam. 31: 722-729; Bartlett et al. (1989) {Spec. Publ, Roy. Soc. Chem.) 78: 182-196; 
Goodford et al. (1985) J. Med. Cam. 28: 849-857; DesJarlais et al. J. Med. Cam. 29: 2149-2153). Directed methods 
generally fall into two categories: (1) design by analogy in which 3-D structures of known molecules (such as from a 
crystallographic database) are docked to the structure and scored for goodness-of-fit; and (2) de novo design, in which 
the ligand model is constructed piece- wise. The latter approach, in particular, can facilitate the development of novel 

35 molecules, uniquely designed to bind to the subject binding pockets. 
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The test compound may be screened as part of a library or a data base of molecules. Modulators of a binding 
pocket of the present invention may be identified by docking a computer representation of compounds from one or 
more database of molecules. Data bases which may be used include ACD (Molecular Designs Limited), NCI 
(National Cancer Institute), CCDC (Cambridge Crystallographic Data Center), CAST (Chemical Abstract Service), 
5 Derwent (Derwent Information Limited), Maybridge (Maybridge Chemical Company Ltd), Aldrich (Aldrich 
Chemical Company), DOCK (University of California in San Francisco), and the Directory of Natural Products 
(Chapman & Hall). Computer programs such as CONCORD (Tripos Associates) or DB-Converter (Molecular 
Simulations Limited) can be used to convert a data set represented in two dimensions to one represented in three 
dimensions, 

10 Test compounds may tested for their capacity to fit spatially into a binding pocket. As used herein, the term 

"fits spatially" means that the three-dimensional structure of the test compound is accommodated geometrically in a 
cavity of a binding pocket. The test compound can then be considered to be a ligand. 

A favourable geometric fit occurs when the surface area of the test compound is in close proximity with the 
surface area of the cavity of a binding pocket without forming unfavorable interactions. A favourable complementary 
15 interaction occurs where the test compound interacts by hydrophobic, aromatic, ionic, dipolar, or hydrogen donating 
and accepting forces. Unfavourable interactions may be steric hindrance between atoms in the test compound and 
atoms in the binding pocket. 

If a model of the present invention is a computer model, the test compounds may be positioned in a binding 
pocket through computational docking. If, on the other hand, the model of the present invention is a structural model, 
20 the test compounds may be positioned in the binding pocket by, for example, manual docking. 

As used herein the term "docking" refers to a process of placing a compound in close proximity with a 
binding pocket, or a process of finding low energy conformations of a test compound/ binding pocket complex. 
A screening method of the present invention may comprise the following steps: 
(i) generating a computer model of a binding pocket using a crystal according to the invention; 
25 (ii) docking a computer representation of a test compound with the computer model; and 

(iii) analysing the fit of the compound in the binding pocket. 
In an aspect of the invention, a method is provided comprising the following steps: 

(a) docking a computer representation of a structure of a test compound into a computer representation 
of a binding pocket defined in accordance with the invention using a computer program, or by 

30 interactively moving the representation of the test compound into the representation of the binding 

pocket; 

(b) characterizing the geometry and the complementary interactions formed between the atoms of the 
binding pocket and the compound; optionally 

(c) searching libraries for molecular fragments which can fit into the empty space between the 
35 compound and the binding pocket and can be linked to the compound; and 
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(d) linking the fragments found in (c) to the compound and evaluating the new modified compound. 
In an embodiment of the invention, a method is provided which comprises the following steps: 

(a) docking a computer representation of a test compound from a computer data base with a computer 
representation of a selected binding pocket defined in accordance with the invention to define a 

5 complex; 

(b) determining a conformation of the complex with a favorable fit and favourable complementary 
interactions; and 

(c) identifying test compounds that best fit the selected binding pocket as potential modulators of a F- 
box protein or SCF complex comprising the binding pocket. 

10 In another embodiment of the invention, a method is provided which comprises docking a computer 

representation of a selected binding pocket defined by the atomic interactions, atomic contacts, or structural 
coordinates in accordance with the invention to define a complex. In particular a method is provided comprising: 

(a) docking a computer representation of a test compound from a computer database with a computer 
representation of a selected binding pocket defined by the atomic interactions, atomic contacts, or 

15 structural coordinates described herein; 

(b) determining a conformation of the complex with a favorable fit and favourable complementary 
interactions; and 

(c) identifying test compounds that best fit the selected binding pocket as potential modulators of the a 
F-box protein or SCF complex comprising the binding pocket 

20 A model used in a screening method may comprise a binding pocket either alone or in association with one 

or more ligands and/or cofactors. For example, the model may comprise the binding pocket in association with a 

substrate (or analogue thereof), and/or modulator. 

If the model comprises an unassociated binding pocket, then the selected site under investigation may be the 

binding pocket itself. The test compound may, for example, mimic a known ligand (e.g. substrate) for an F-box 
25 protein in order to interact with the binding pocket. The selected site may alternatively be another site on the F-box 

protein. 

If the model comprises an associated binding pocket, for example a binding pocket in association with a 
substrate or ligand, the selected site may be the binding pocket or a site made up of the binding pocket and the 
complexed substrate or ligand, or a site on the substrate or ligand itself. The test compound may be investigated for 
30 its capacity to modulate the interaction with the associated molecule. 

The screening methods described herein may be applied to a plurality of test compounds, to identify those 
that best fit the selected site. A test compound (or plurality of test compounds) may be selected on the basis of their 
similarity to a substrate or ligand for an F-box protein. For example, the screening method may comprise the 
following steps: 

35 (i) generating a computer model of a binding pocket in complex with a substrate or ligand; 
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(ii) searching for a test compound with a similar three dimensional structure and/or similar chemical 
groups as the substrate or ligand; and 

(iii) evaluating the fit of the test compound in the binding pocket. 

Searching may be carried out using a database of computer representations of potential compounds, using 
methods known in the art. 

The present invention also provides a method for designing ligands for F-box proteins or SCF complexes. It 
is well known in the art to use a screening method as described above to identify a test compound with promising fit, 
but then to use this test compound as a starting point to design a ligand with improved fit to the model. Such 
techniques are known as "structure-based ligand design" (See Kuntz et al., 1994, Acc. Chem. Res. 27:117; Guida, 
1994, Current Opinion in Struc. Biol. 4: 777; and Colman, 1994, Current Opinion in Struc. Biol. 4: 868, for reviews of 
structure-based drug design and identification;and Kuntz et al 1982, J. Mol. Biol. 162:269; Kuntz et al., 1994, Acc. 
Chem. Res. 27: 117; Meng et al., 1992, J. Compt. Chem. 13: 505; Bohm, 1994, J. Comp. Aided Molec. Design 8: 623 
for methods of structure-based modulator design). 

Examples of computer programs that may be used for structure-based ligand design are CAVEAT (Bartlett et 
al., 1989, in "Chemical and Biological Problems in Molecular Recognition", Roberts, S.M. Ley, S.V.; Campbell, 
N.M. eds; Royal Society of Chemistry: Cambridge, pp 182-196); FLOG (Miller et al., 1994, J. Comp. Aided Molec. 
Design 8:153); PRO Modulator (Clark et al., 1995 J. Comp. Aided Molec. Design 9:13); MCSS (Miranker and 
Karplus, 1991, Proteins: Structure, Fuction, and Genetics 8:195); and, GRID (Goodford, 1985, J. Med. Chem. 
28:849). 

The method may comprise the following steps: 

(i) docking a model of a test compound with a model of a binding pocket; 

(ii) identifying one or more groups on the test compound which may be modified to improve their fit in 
the binding pocket; 

(iii) replacing one or more identified groups to produce a modified test compound model; and 

(iv) docking the modified test compound model with the model of the binding pocket. 
Evaluation of fit may comprise the following steps: 

(a) mapping chemical features of a test compound such as by hydrogen bond donors or acceptors, 
hydrophobic/lipophilic sites, positively ionizable sites, or negatively ionizable sites; and 

(b) adding geometric constraints to selected mapped features. 

The fit of the modified test compound may then be evaluated using the same criteria. 

The chemical modification of a group may either enhance or reduce hydrogen bonding interaction, charge 
interaction, hydrophobic interaction, Van Der Waals interaction or dipole interaction between the test compound and 
the key amino acid residue(s) of the binding pocket. Preferably the group modifications involve the addition removal 
or replacement of substituents onto the test compound such that the substituents are positioned to collide or to bind 
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preferentially with one or more amino acid residues that correspond to the key amino acid residues of the binding 
pocket. 

If a modified test compound model has an improved fit, then it may bind to a binding pocket and be 
considered to be a "ligand". Rational modification of groups may be made with the aid of libraries of molecular 
5 fragments which may be screened for their capacity to fit into the available space and to interact with the appropriate 
atoms. Databases of computer representations of libraries of chemical groups are available commercially, for this 
purpose. 

The test compound may also be modified "in situ" (i.e. once docked into the potential binding pocket), 
enabling immediate evaluation of the effect of replacing selected groups. The computer representation of the test 

10 compound may be modified by deleting a chemical group or groups, or by adding a chemical group or groups. After 
each modification to a compound, the atoms of the modified compound and potential binding pocket can be shifted in 
conformation and the distance between the modulator and the binding pocket atoms may be scored on the basis of 
geometric fit and favourable complementary interactions between the molecules. This technique is described in detail 
in Molecular Simulations User Manual, 1995 in LUDI. 

15 Examples of ligand building and/or searching computer programs include programs in the Molecular 

Simulations Package (Catalyst), ISIS/HOST, ISIS/BASE, and ISIS/DRAW (Molecular Designs Limited), and UNITY 
(Tripos Associates). 

The "starting point" for rational ligand design may be a known substrate or lignad.. For example, in order to 
identify potential modulators of an F-box protein, a logical approach would be to start with a known ligand or 
20 substrate to produce a molecule which mimics the binding of the ligand or substrate. Such a molecule may, for 
example, act as a competitive inhibitor for the true substrate or ligand, or may bind so strongly that the interaction 
(and inhibition) is effectively irreversible. 

Such a method may comprise the following steps: 

(i) generating a computer model of a binding pocket in complex with a substrate or ligand; 
25 (ii) replacing one or more groups on the ligand model to produce a modified substrate or ligand; and 

(iii) evaluating the fit of the modified substrate or ligand in the binding pocket. 

The replacement groups could be selected and replaced using a compound construction program which 
replaces computer representations of chemical groups with groups from a computer database, where the 
representations of the compounds are defined by structural coordinates. 
30 In an embodiment, a screening method is provided for identifying a substrate or ligand of an F-box protein, 

comprising the step of using the structural coordinates of a CPD motif defined in relation to its spatial association with 
a binding pocket of the invention, to generate a compound that is capable of associating with the binding pocket. 

In an embodiment of the invention, a screening method is provided for identifying a ligand of an F-box 
protein, in particular a cdc4 protein, comprising the step of using the structural coordinates of the CPD motif listed in 
35 Table 6 to generate a compound for associating with a binding pocket of an F-box protein, in particular a cdc4 protein 
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as described herein. The following steps are employed in a particular method of the invention: (a) generating a 
computer representation of a CPD motif defined by its structural coordinates listed in Table 6; and (b) searching for 
molecules in a data base that are structurally or chemically similar to the defined CPD motif, using a searching 
computer program, or replacing portions of the CPD motif with similar chemical structures from a database using a 
5 compound building computer program. 

A screening method is provided for identifying a ligand of an F-box protein, in particular a cdc4 protein, or a 
SCF complex comprising the step of using the structural coordinates of a binding pocket comprising a WD40 repeat 
or part thereof listed in Table 6 to generate a compound for associating with a F-box domain of an F-box protein. The 
following steps are employed in a particular method of the invention: (a) generating a computer representation of a 

10 binding pocket comprising a WD40 repeat region or part thereof defined by its structural coordinates listed in Table 6; 
and (b) searching for molecules in a data base that are structurally or chemically similar to the defined binding pocket 
using a searching computer program, or replacing portions of the binding pocket with structures from a database using 
a compound building computer program. 

A screening method is provided for identifying a ligand of an F-box protein, in particular a cdc4 protein, of a 

15 SCF complex comprising the step of using the structural coordinates of a binding pocket comprising an F-box domain 
or part thereof, or helical linker listed in Table 6 to generate a compound for associating with a F-box domain or 
helical linker of an F-box protein. The following steps are employed in a particular method of the invention: (a) 
generating a computer representation of a binding pocket comprising a an F-box domain or part thereof, or helical 
linker defined by its structural coordinates listed in Table 4; and (b) searching for molecules in a data base that are 

20 structurally or chemically similar to the defined binding pocket using a searching computer program, or replacing 
portions of the binding pocket with structures from a database using a compound building computer program. 

The screening methods of the present invention may be used to identify compounds or entities that associate 
with a molecule that associates with an F-box protein, in particular a cdc4 protein, or an SCF complex. 

In an illustrative embodiment, the design of potential modulators or substrates for SCF complexes begins 

25 from the general perspective of shape complimentarity for an active site and substrate specificity subsites of the 
receptor, and a search algorithm is employed which is capable of scanning a database of small molecules of known 
three-dimensional structure for candidates which fit geometrically into the target protein site. It is not expected that 
the molecules found in the shape search will necessarily be leads themselves, since no evaluation of chemical 
interaction need necessarily be made during the initial search. Rather, it is anticipated that such candidates might act 

30 as the framework for further design, providing molecular skeletons to which appropriate atomic replacements can be 
made. Of course, the chemical complimentarity of these molecules can be evaluated, but it is expected that atom 
types will be changed to maximize the electrostatic, hydrogen bonding, and hydrophobic interactions with the 
receptor. Most algorithms of this type provide a method for finding a wide assortment of chemical structures that are 
complementary to the shape of a binding site of a subject molecule or complex. Each of a set of small molecules from 

35 a particular data-base, such as the Cambridge Crystallographic Data Bank (CCDB) (Allen et al. (1973) J. Chem. Doc. 
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13: 119), is individually docked to the binding pocket of the invention, in a number of geometrically permissible 
orientations with use of a docking algorithm. In a preferred embodiment, a set of computer algorithms called DOCK, 
can be used to characterize the shape of invaginations and grooves that form active sites and recognition surfaces of a 
subject molecule or complex(Kuntz et al. (1982) J. Mol Biol 161 : 269-288). The program can also search a database 
5 of small molecules for templates whose shapes are complementary to particular binding pockets or sites of a receptor 
(DesJarlais et al. (1988) J Med Chem 31: 722-729). These templates normally require modification to achieve good 
chemical and electrostatic interactions (DesJarlais et al. (1989) ACSSymp Ser 413: 60-69). However, the program has 
been shown to position accurately known cofactors for ligands based on shape constraints alone. 

The orientations are evaluated for goodness-of-fit and the best are kept for further examination using 

10 molecular mechanics programs, such as AMBER or CHARMM. Such algorithms have previously proven successful 
in finding a variety of molecules that are complementary in shape to a given binding site of a molecule or complex, 
and have been shown to have several attractive features. First, such algorithms can retrieve a remarkable diversity of 
molecular architectures. Second, the best structures have, in previous applications to other proteins, demonstrated 
impressive shape complementarity over an extended surface area. Third, the overall approach appears to be quite 

15 robust with respect to small uncertainties in positioning of the candidate atoms. 

Goodford (1985, J Med Chem 28:849-857) and Boobbyer et al. (1989, J Med Chem 32:1083-1094) have 
produced a computer program (GRID) which seeks to determine regions of high affinity for different chemical groups 
(termed probes) on the molecular surface of the binding site. GRID hence provides a tool for suggesting 
modifications to known ligands that might enhance binding. It may be anticipated that some of the sites discerned by 

20 GRID as regions of high affinity correspond to "pharmacophore patterns" determined inferentially from a series of 
known ligands. As used herein, a pharmacophoric pattern is a geometric arrangement of features of the anticipated 
ligand that is believed to be important for binding. Attempts have been made to use pharmacophoric patterns as a 
search screen for novel ligands (Jakes et al. (1987) J Mol Graph 5:41-48; Brint et al. (1987) J Mol Graph 5:49-56; 
Jakes et al. (1986) J Mol Graph 4:12-20); however, the constraint of steric and "chemical" fit in the putative (and 

25 possibly unknown) binding pocket or site is ignored. Goodsell and Olson (1990, Proteins: Struct Funct Genet 8:195- 
202) have used the Metropolis (simulated annealing) algorithm to dock a single known ligand into a target protein. 
They allow torsional flexibility in the ligand and use GRID interaction energy maps as rapid lookup tables for 
computing approximate interaction energies. Given the large number of degrees of freedom available to the ligand, 
the Metropolis algorithm is time-consuming and is unsuited to searching a candidate database of a few thousand small 

30 molecules. 

Yet a further embodiment of the present invention utilizes a computer algorithm such as CLIX which 
searches such databases as CCDB for small molecules which can be oriented in a binding pocket or site in a way that 
is both sterically acceptable and has a high likelihood of achieving favorable chemical interactions between the 
candidate molecule and the surrounding amino acid residues. The method is based on characterizing a binding pocket 
35 in terms of an ensemble of favorable binding positions for different chemical groups and then searching for 
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orientations of the candidate molecules that cause maximum spatial coincidence of individual candidate chemical 
groups with members of the ensemble. The current availability of computer power dictates that a computer-based 
search for novel ligands follows a breadth-first strategy. A breadth-first strategy aims to reduce progressively the size 
of the potential candidate search space by the application of increasingly stringent criteria, as opposed to a depth-first 
5 strategy wherein a maximally detailed analysis of one candidate is performed before proceeding to the next. CLIX 
conforms to this strategy in that its analysis of binding is rudimentary -it seeks to satisfy the necessary conditions of 
steric fit and of having individual groups in "correct" places for bonding, without imposing the sufficient condition 
that favorable bonding interactions actually occur. A ranked "shortlist" of molecules, in their favored orientations, is 
produced which can then be examined on a molecule-by-molecule basis, using computer graphics and more 

10 sophisticated molecular modeling techniques. CLIX is also capable of suggesting changes to the substituent chemical 
groups of the candidate molecules that might enhance binding. 

The algorithmic details of CLIX is described in Lawerence et al. (1992) Proteins 12:31-41, and the CLIX 
algorithm can be summarized as follows. The GRID program is used to determine discrete favorable interaction 
positions (termed target sites) in the binding pocket or site of the protein for a wide variety of representative chemical 

15 groups. For each candidate ligand in the CCDB an exhaustive attempt is made to make coincident, in a spatial sense 
in the binding site of the protein, a pair of the candidate's substituent chemical groups with a pair of corresponding 
favorable interaction sites proposed by GRID. AH possible combinations of pairs of ligand groups with pairs of GRID 
sites are considered during this procedure. Upon locating such coincidence, the program rotates the candidate ligand 
about the two pairs of groups and checks for steric hindrance and coincidence of other candidate atomic groups with 

20 appropriate target sites. Particular candidate/orientation combinations that are good geometric fits in the binding site 
and show sufficient coincidence of atomic groups with GRID sites are retained. 

Consistent with the breadth-first strategy, this approach involves simplifying assumptions. Rigid protein and 
small molecule geometry is maintained throughout. As a first approximation rigid geometry is acceptable as the 
energy minimized coordinates of a deduced structure, describe an energy minimum for the molecule, albeit a local 

25 one. If the surface residues of the site of interest are not involved in crystal contacts then the crystal configuration of 
those residues is used merely as a starting point for energy minimization, and potential solution structures for those 
residues determined. The deduced structure should reasonably mimic the mean solution configuration. 

A further assumption implicit in CLIX is that the potential ligand, when introduced into the binding pocket or 
site of a receptor, does not induce change in the protein's stereochemistry or partial charge distribution and so alter the 

30 basis on which the GRID interaction energy maps were computed. It must also be stressed that the interaction sites 
predicted by GRID are used in a positional and type sense only, i.e., when a candidate atomic group is placed at a site 
predicted as favorable by GRID, no check is made to ensure that the bond geometry, the state of protonation, or the 
partial charge distribution favors a strong interaction between the protein and that group. Such detailed analysis 
should form part of more advanced modeling of candidates identified in the CLIX shortlist. 
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Yet another embodiment of a computer-assisted molecular design method for identifying ligands of a binding 
pocket of the invention comprises the de novo synthesis of potential ligands by algorithmic connection of small 
molecular fragments that will exhibit the desired structural and electrostatic complementarity with an active site or 
binding pocket of the receptor. The methodology employs a large template set of small molecules with are iteratively 
5 pieced together in a model of a binding pocket. Each stage of ligand growth is evaluated according to a molecular 
mechanics-based energy function, which considers van der Waals and coulombic interactions, internal strain energy of 
the lengthening ligand, and desolvation of both ligand and receptor. The search space can be managed by use of a 
data tree that is kept under control by pruning according to the binding criteria. 

In an illustrative embodiment, the search space is limited to consider only amino acids and amino acid 

10 analogs as the molecular building blocks. Such a methodology generally employs a large template set of amino acid 
conformations, though need not be restricted to just the 20 natural amino acids, as it can easily be extended to include 
other related fragments of interest to the medicinal chemist, e.g. amino acid analogs. The putative ligands that result 
from this construction method are peptides and peptide-Iike compounds rather than the small organic molecules that 
are typically the goal of drug design research. The appeal of the peptide building approach is not that peptides are 

15 preferable to organics as potential pharmaceutical agents, but rather that: (1) they can be generated relatively rapidly 
de novo; (2) their energetics can be studied by well-parameterized force field methods; (3) they are much easier to 
synthesize than are most organics; and (4) they can be used in a variety of ways, for peptidomimetic ligand design, 
protein-protein binding studies, and even as shape templates in the more commonly used 3D organic database search 
approach described above. 

20 Such a de novo peptide design method has been incorporated in a software package called GROW (Moon et 

al. (1991) Proteins 11:314-328). In a typical design session, standard interactive graphical modeling methods are 
employed to define the structural environment in which GROW is to operate. For instance, environment could be an 
active site binding pocket of an F-box protein, or it could be a set of features on the protein's surface to which the user 
wishes to bind a peptide-like molecule. The GROW program then operates to generate a set of potential ligand 

25 molecules. Interactive modeling methods then come into play again, for examination of the resulting molecules, and 
for selection of one or more of them for further refinement. 

To illustrate, GROW operates on an atomic coordinate file generated by the user in the interactive modeling 
session, such as the coordinates provided in Table 4, or the coordinates of a binding pocket or active site as described 
in Tables 2 and 4 plus a small fragment (e.g., an acetyl group) positioned in the active site to provide a starting point 

30 for peptide growth. These are referred to as "site" atoms and "seed" atoms, respectively. A second file provided by 
the user contains a number of control parameters to guide the peptide growth (Moon et al. (1991) Proteins 1 1:314- 
328). 

The operation of the GROW algorithm is conceptually fairly simple. GROW proceeds in an iterative 
fashion, to systematically attach to the seed fragment each amino acid template in a large preconstructed library of 
35 amino acid conformations. When a template has been attached, it is scored for goodness-of-fit to the receptor site or 
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binding pocket, and then the next template in the library is attached to the seed. After all the templates have been 
tested, only the highest scoring ones are retained for the next level of growth. This procedure is repeated for the 
second growth level; each library template is attached in turn to each of the bonded seed/amino acid molecules that 
were retained from the first step, and then scored. Again, only the best of the bonded seed/dipeptide molecules that 
5 result are retained for the third level of growth. The growth of peptides can proceed in the N-to-C direction only, the 
reverse direction only, or in alternating directions, depending on the initial control specifications supplied by the user. 
Successive growth levels therefore generate peptides that are lengthened by one residue. The procedure terminates 
when the user-defined peptide length has been reached, at which point the user can select from the constructed 
peptides those to be studied further. The resulting data provided by the GROW procedure includes not only residue 
10 sequences and scores, but also atomic coordinates of the peptides, related directly to the coordinate system of the 
binding site atoms. 

In yet another embodiment, potential pharmacophoric compounds can be determined using a method based 
on an energy minimization-quenched molecular dynamics algorithm for determining energetically favorable positions 
of functional groups in the binding pockets of the invention. The method can aid in the design of molecules that 

15 incorporate such functional groups by modification of known ligands or de novo construction. 

For example, the multiple copy simultaneous search method (MCSS) described by Miranker et al. (1991) 
Proteins 11: 29-34 may be employed. To determine and characterize a local minima of a functional group in the 
forcefield of the protein, multiple copies of selected functional groups are first distributed in a binding pocket of 
interest on the F-box protein. Energy minimization of these copies by molecular mechanics or quenched dynamics 

20 yields the distinct local minima. The neighborhood of these minima can then be explored by a grid search or by 
constrained minimization. In one embodiment, the MCSS method uses the classical time dependent Hartee (TDH) 
approximation to simultaneously minimize or quench many identical groups in the forcefield of the protein. 

Implementation of the MCSS algorithm requires a choice of functional groups and a molecular mechanics 
model for each of them. Groups must be simple enough to be easily characterized and manipulated (3-6 atoms, few or 

25 no dihedral degrees of freedom), yet complex enough to approximate the steric and electrostatic interactions that the 
functional group would have in binding to the pocket or site of interest in the F-box protein. A preferred set is, for 
example, one in which most organic molecules can be described as a collection of such groups (Patai's Guide to the 
Chemistry of Functional Groups, ed. S. Patai (New York: John Wiley, and Sons, (1989)), This includes fragments 
such as acetonitrile, methanol, acetate, methyl ammonium, dimethyl ether, methane, and acetaldehyde. 

30 Determination of the local energy minima in the binding pocket or site requires that many starting positions 

be sampled. This can be achieved by distributing, for example, 1,000-5,000 groups at random inside a sphere centered 
on the binding site; only the space not occupied by the protein needs to be considered. If the interaction energy of a 
particular group at a certain location with the protein is more positive than a given cut-off (e.g. 5.0 kcal/mole) the 
group is discarded from that site. Given the set of starting positions, all the fragments are minimized simultaneously 

35 by use of the TDH approximation (Elber et al. (1990) J Am Chem Soc 1 12: 9161-9175). In this method, the forces on 
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each fragment consist of its internal forces and those due to the protein. The essential element of this method is that 
the interactions between the fragments are omitted and the forces on the protein are normalized to those due to a 
single fragment In this way simultaneous minimization or dynamics of any number of functional groups in the field 
of a single protein can be performed. 
5 Minimization is performed successively on subsets of, for example 100, of the randomly placed groups. 

After a certain number of step intervals, such as 1,000 intervals, the results can be examined to eliminate groups 
converging to the same minimum. This process is repeated until minimization is complete (e.g. RMS gradient of 0.01 
kcal/mole/C). Thus the resulting energy minimized set of molecules comprises what amounts to a set of disconnected 
fragments in three dimensions representing potential pharmacophores. 

10 The next step then is to connect the pharmacophoric pieces with spacers assembled from small chemical 

entities (atoms, chains, or ring moieties). In a preferred embodiment, each of the disconnected can be linked in space 
to generate a single molecule using such computer programs as, for example, NEWLEAD (Tschinke et al. (1993) J 
Med Chem 36: 3863,3870). The procedure adopted by NEWLEAD executes the following sequence of commands: 
(1) connect two isolated moieties, (2) retain the intermediate solutions for further processing, (3) repeat the above 

15 steps for each of the intermediate solutions until no disconnected units are found, and (4) output the final solutions, 
each of which is a single molecule. Such a program can use for example, three types of spacers: library spacers, 
single-atom spacers, and fuse-ring spacers. The library spacers are optimized structures of small molecules such as 
ethylene, benzene and methylamide. The output produced by programs such as NEWLEAD consist of a set of 
molecules containing the original fragments now connected by spacers. The atoms belonging to the input fragments 

20 maintain their original orientations in space. The molecules are chemically plausible because of the simple makeup of 
the spacers and functional groups, and energetically acceptable because of the rejection of solutions with van-der 
Waals radii violations. 

Compounds and entities (e.g. ligands) of F-box proteins, in particular cdc4 proteins, or SCF complexes 
identified using the above-described methods may be prepared using methods described in standard reference sources 
25 utilized by those skilled in the art. For example, organic compounds may be prepared by organic synthetic methods 
described in references such as March, 1994, Advanced Organic Chemistry: Reactions, Mechanisms, and Structure, 
New York, McGraw Hill. 

Test compounds and ligands which are identified using a crystal or model of the present invention can be 
screened in assays such as those well known in the art. Screening may be for example in vitro, in cell culture, and/or 

30 in vivo. Biological screening assays preferably centre on activity-based response models, binding assays (which 
measure how well a compound binds to a binding pocket of a receptor), and bacterial, yeast, and animal cell lines 
(which measure the biological effect of a compound in a cell). The assays may be automated for high throughput 
screening in which large numbers of compounds can be tested to identify compounds with the desired activity. The 
biological assay may also be an assay for the binding activity of a compound that selectively binds to the binding 

35 pocket compared to other receptors. 
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LIGANDS/COMPOUNDS IDENTIFIED BY SCREENING METHODS 

The present invention provides a ligand or compound identified by a screening method of the present 
invention. A ligand or compound may have been designed rationally by using a model according to the present 
invention. A ligand or compound identified using the screening methods of the invention may specifically associate 

5 with a target compound, or part thereof (e.g. a binding pocket). In the present invention the target compound may be 
the F-box protein or SCF complex or part thereof, or a molecule that is capable of associating with an F-box protein or 
SCF complex or part thereof (for example a substrate). 

A ligand or compound identified using a screening method of the invention may act as a "modulator", i.e. a 
compound which affects the activity of an F-box protein or SCF complex. A modulator may reduce, enhance or alter 

10 the biological function of an F-box protein or an SCF E3 ubiquitin ligase. For example a modulator may modulate the 
capacity of the F-box protein or an SCF E3 ubiquitin ligase to interact with its substrate. An alteration in biological 
function may be characterised by a change in specificity. For example, a modulator may cause the F-box protein to 
interact with a different substrate. In order to exert its function, the modulator commonly binds to a binding pocket. 

A "modulator" which is capable of reducing the biological function of the enzyme may also be known as an 

15 inhibitor. Preferably an inhibitor reduces or blocks the capacity of the F-box protein or an SCF E3 ubiquitin ligase to 
interact with its substrate thus reducing or blocking ubiquitination of the substrate. The inhibitor may mimic the 
binding of a substrate, for example, it may be a substrate analogue. A substrate analogue may be designed by 
considering the interactions between the substrate and the F-box protein or an SCF E3 ubiquitin ligase (for example, 
by using information derivable from the crystal of the invention) and specifically altering one or more groups (as 

20 described above). 

The present invention also provides a method for modulating the activity of an F-box protein, in particular a 
cdc4 protein, using a modulator according to the present invention. The invention also provides a method for 
modulating (e.g. potentiating or inhibiting) ubiquitination of a substrate by an SCF E3 ubiquitin ligase, by potentiating 
or inhibiting the substrate binding pocket of the ligase. Inhibition of ubiquitination of a substrate may decrease 
25 signaling and inhibit cellular processes that may be involved in disease. It would be possible to monitor cellular 
processes following such treatments by a number of methods known in the art. 

A modulator may be an agonist, partial agonist, partial inverse agonist or antagonist of an F-box protein. 

As mentioned above, a substrate or an identified ligand may act as a ligand model (for example, a template) 
for the development of other compounds. A modulator may be a mimetic of a substrate or ligand. 
30 Like the test compound (see above) a modulator may be one or a variety of different sorts of molecule. (See 

examples herein.) A modulator may be an endogenous physiological compound, or it may be a natural or synthetic 
compound. The term "modulator" also refers to a chemically modified ligand or substrate. 

The technique suitable for preparing a modulator will depend on its chemical nature. For example, peptides 
can be synthesized by solid phase techniques (Roberge JY et al (1995) Science 269: 202-204) and automated 
35 synthesis may be achieved, for example, using the ABI 43 1 A Peptide Synthesizer (Perkin Elmer) in accordance with 
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the instructions provided by the manufacturer. Once cleaved from the resin, the peptide may be purified by 
preparative high performance liquid chromatography (e.g., Creighton (1983) Proteins Structures and Molecular 
Principles, WH Freeman and Co, New York NY). The composition of the synthetic peptides may be confirmed by 
amino acid analysis or sequencing (e.g., the Edman degradation procedure; Creighton, supra). 
5 If a modulator is a nucleotide, or a polypeptide expressable therefrom, it may be synthesized, in whole or in 

part, using chemical methods well known in the art (see Caruthers MH et al (1980) Nuc Acids Res Symp Ser 215-23, 
Horn T et al (1980) Nuc Acids Res Symp Ser 225-232), or it may be prepared using recombinant techniques well 
known in the art. 

Organic compounds may be prepared by organic synthetic methods described in references such as March, 
10 1994, Advanced Organic Chemistry: Reactions, Mechanisms, and Structure, New York, McGraw Hill. 

The invention also relates to classes of modulators of F-box proteins, in particular cdc4 proteins based on the 
structure and shape of a substrate or component thereof, defined in relation to the substrate's spatial association with a 
crystal structure of the invention or part thereof. 

A class of modulators may comprise a compound containing a structure of a CPD motif. In particular, the 
15 modulators can comprise a CPD motif having the structural coordinates of the CPD motif in the active site binding 
pocket of an F-box protein. In an embodiment, a modulator comprises the structural coordinates of a CPD motif 
having the structural coordinates listed in Table 6. 

The invention contemplates all optical isomers and racemic forms of the modulators of the invention. 
PHARMACEUTICAL COMPOSITION 
20 The present invention also provides for the use of a modulator according to the invention, in the manufacture 

of a medicament to treat and/or prevent a disease in a mammalian patient. There is also provided a pharmaceutical 
composition comprising such a modulator and a method of treating and/or preventing a disease comprising the step of 
administering such a modulator or pharmaceutical composition to a subject, preferably a mammalian patient. 

The pharmaceutical compositions may be for human or animal usage in human and veterinary medicine and 
25 will typically comprise a pharmaceutically acceptable carrier, diluent, excipient, adjuvant or combination thereof. 

Acceptable carriers or diluents for therapeutic use are well known in the pharmaceutical art, and are 
described, for example, in Remington's Pharmaceutical Sciences, Mack Publishing Co. (A. R. Gennaro edit. 1985). 
The choice of pharmaceutical carrier, excipient or diluent can be selected with regard to the intended route of 
administration and standard pharmaceutical practice. The pharmaceutical compositions may comprise as - or in 
30 addition to - the carrier, excipient or diluent any suitable binder(s), lubricant(s), suspending agent(s), coating agent(s), 
solubilising agent(s). 

Preservatives, stabilizers, dyes and even flavouring agents may be provided in the pharmaceutical 
composition. Examples of preservatives include sodium benzoate, sorbic acid and esters of p-hydroxybenzoic acid. 
Antioxidants and suspending agents may also be used. 
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The routes for administration (delivery) include, but are not limited to, one or more of: oral (e.g. as a tablet, 
capsule, or as an ingestable solution), topical, mucosal (e.g. as a nasal spray or aerosol for inhalation), nasal, 
parenteral (e.g. by an injectable form), gastrointestinal, intraspinal, intraperitoneal, intramuscular, intravenous, 
intrauterine, intraocular, intradermal, intracranial, intratracheal, intravaginal, intracerebroventricular, intracerebral, 
5 subcutaneous, ophthalmic (including intravitreal or intracameral), transdermal, rectal, buccal, vaginal, epidural, 
sublingual. 

Where the pharmaceutical composition is to be delivered mucosally through the gastrointestinal mucosa, it 
should be able to remain stable during transit though the gastrointestinal tract; for example, it should be resistant to 
proteolytic degradation, stable at acid pH and resistant to the detergent effects of bile. 

10 Where appropriate, the pharmaceutical compositions can be administered by inhalation, in the form of a 

suppository or pessary, topically in the form of a lotion, gel, hydrogel, solution, cream, ointment or dusting powder, 
by use of a skin patch, orally in the form of tablets containing excipients such as starch or lactose or chalk, or in 
capsules or ovules either alone or in admixture with excipients, or in the form of elixirs, solutions or suspensions 
containing flavouring or colouring agents, or they can be injected parenterally, for example intravenously, 

15 intramuscularly or subcutaneously. For parenteral administration, the compositions may be best used in the form of a 
sterile aqueous solution which may contain other substances, for example enough salts or monosaccharides to make 
the solution isotonic with blood. The aqueous solutions should be suitably buffered (preferably to a pH of from 3 to 
9), if necessary. The preparation of suitable parenteral formulations under sterile conditions is readily accomplished 
by standard pharmaceutical techniques well-known to those skilled in the art. 

20 If the agent of the present invention is administered parenterally, then examples of such administration 

include one or more of: intravenously, intra-arterially, intraperitoneal ly, intrathecally, intraventricular^, 
intraurethrally, intrasternally, intracranially, intramuscularly or subcutaneously administering the agent; and/or by 
using infusion techniques. 

For buccal or sublingual administration the compositions may be administered in the form of tablets or 
25 lozenges which can be formulated in a conventional manner. 

The tablets may contain excipients such as microcrystalline cellulose, lactose, sodium citrate, calcium 
carbonate, dibasic calcium phosphate and glycine, disintegrants such as starch (preferably corn, potato or tapioca 
starch), sodium starch glycollate, croscarmellose sodium and certain complex silicates, and granulation binders such 
as polyvinylpyrrolidone, hydroxypropylmethylcellulose (HPMC), hydroxypropylcellulose (HPC), sucrose, gelatin and 
30 acacia. Additionally, lubricating agents such as magnesium stearate, stearic acid, glyceryl behenate and talc may be 
included. 

Solid compositions of a similar type may also be employed as fillers in gelatin capsules. Preferred excipients 
in this regard include lactose, starch, cellulose, milk sugar or high molecular weight polyethylene glycols. For 
aqueous suspensions and/or elixirs, the agent may be combined with various sweetening or flavouring agents, 
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colouring matter or dyes, with emulsifying and/or suspending agents and with diluents such as water, ethanol, 
propylene glycol and glycerin, and combinations thereof. 

As indicated, a therapeutic agent (e.g. modulator) of the present invention can be administered intranasally or 
by inhalation and is conveniently delivered in the form of a dry powder inhaler or an aerosol spray presentation from a 
5 pressurised container, pump, spray or nebuliser with the use of a suitable propellant, e.g. dichlorodifluoromethane, 
trichlorofluoromethane, dichlorotetrafluoroethane, a hydrofluoroalkane such as 1,1,1,2-tetrafluoroethane (HFA 
134A™) or 1,1, 1,2,3,3 ,3 -heptafluoropropane (HFA 227EA™), carbon dioxide or other suitable gas. In the case of a 
pressurised aerosol, the dosage unit may be determined by providing a valve to deliver a metered amount. The 
pressurised container, pump, spray or nebuliser may contain a solution or suspension of the active compound, e.g. 

10 using a mixture of ethanol and the propellant as the solvent, which may additionally contain a lubricant, e.g. sorbitan 
trioleate. Capsules and cartridges (made, for example, from gelatin) for use in an inhaler or insufflator may be 
formulated to contain a powder mix of the agent and a suitable powder base such as lactose or starch. 

Therapeutic administration of polypeptide modulators may also be accomplished using gene therapy. A 
nucleic acid including a promoter operatively linked to a heterologous polypeptide may be used to produce high-level 

15 expression of the polypeptide in cells transfected with the nucleic acid. DNA or isolated nucleic acids may be 
introduced into cells of a subject by conventional nucleic acid delivery systems. Suitable delivery systems include 
liposomes, naked DNA, and receptor-mediated delivery systems, and viral vectors such as retroviruses, herpes viruses, 
and adenoviruses. 
APPLICATIONS 

20 The invention further provides a method of treating a mammal, the method comprising administering to a 

mammal a modulator or pharmaceutical composition of the present invention. 

In particular, the invention contemplates a method of treating or preventing a condition or disease associated 
with an F-box protein or SCF complex in a cellular organism, comprising: 

(a) administering a modulator of the invention in an acceptable pharmaceutical preparation; and 
25 (b) activating or inhibiting an F-box protein or SCF complex or their interaction with a substrate to treat 

or prevent the disease. 

The invention provides for the use of a modulator identified by the methods of the invention in the 
preparation of a medicament to treat or prevent a disease in a cellular organism. Use of modulators of the invention to 
manufacture a medicament is also provided. 
30 Typically, a physician will determine the actual dosage of a modulator or pharmaceutical composition of the 

invention that will be most suitable for an individual subject and it will vary with the age, weight and response of the 
particular patient and severity of the condition. There can, of course, be individual instances where higher or lower 
dosage ranges are merited. 

The specific dose level and frequency of dosage for any particular patient may be varied and will depend 
35 upon a variety of factors including the activity of the specific compound employed, the metabolic stability and length 
of action of that compound, the age, body weight, general health, sex, diet, mode and time of administration, rate of 



51 

excretion, drug combination, the severity of the particular condition, and the individual undergoing therapy. By way 
of example, the pharmaceutical composition of the present invention may be administered in accordance with a 
regimen of 1 to 10 times per day, such as once or twice per day. 

For oral and parenteral administration to human patients, the daily dosage level of the agent may be in single 
5 or divided doses. 

The modulators and compositions of the invention may be useful in the prevention and treatment of 
conditions involving aberrant F-box proteins or SCF complexes. In particular the modulators and compositions may 
be useful in treating cancer or Alzheimer's Disease. 

Conditions which may be prevented or treated in accordance with the invention include but are not limited to 
10 lymphoproliferative conditions, and malignant and pre-malignant conditionss. Malignant and pre-malignant 
conditions may include solid tumors, B cell lymphomas, chronic lymphocytic leukemia, chronic myelogenous 
leukemia, prostate hypertrophy, Hirschsprung disease, glioblastoma, breast and ovarian cancer, adenocarcinoma of the 
salivary gland, premyelocytic leukemia, prostate cancer, multiple endocrine neoplasia type IIA and IIB, medullary 
thyroid carcinoma, papillary carcinoma, papillary renal carcinoma, hepatocellular carcinoma, gastrointestinal stromal 
15 tumors, sporadic mastocytosis, acute myeloid leukemia, large cell lymphoma or Alk lymphoma, chronic myeloid 
leukemia, hematological /solid tumors, papillary thyroid carcinoma, stem cell leukemia/lymphoma syndrome, acute 
myelogenous leukemia, osteosarcoma, multiple myeloma, preneoplastic liver foci, and resistance to chemotherapy. 

Modulators and compositions of the invention may be used to restore function to a mutant F-box protein, in . 
particular a mutant cdc4 polypeptide. Modulators and compositions of the invention, in particular inhibitors may also 
20 have utility in treating diseases associated with F-box mutations, in particular cdc4 polypeptide mutations, in 
combination with other cancer mutations, Notch pathway mutations or presenilin mutations. 

A modulator of the invention may be used to promote binding of a substrate to a SCF complex. In an 
embodiment a modulator that associates (preferably with high affinity) with a binding pocket of a SCF complex as 
described herein, is linked to an agent that binds to a substrate to be ubiquitinated by a SCF complex. A modulator- 
25 agent-substrate complex where the modulator is derived from a binding pocket of an F-box protein as described herein 
may be used in treating diseases associated with a mutant F-box protein. 

Therapeutic efficacy and toxicity of compositions and modulators of the invention may be determined by 
standard pharmaceutical procedures in cell cultures or with experimental animals, such as by calculating the ED 50 (the 
dose therapeutically effective in 50% of the population) or LD 50 (the dose lethal to 50% of the population) statistics. 
30 The therapeutic index is the dose ratio of therapeutic to toxic effects and it can be expressed as the ED 5 o/LD 5 o ratio. 
Pharmaceutical compositions that exhibit large therapeutic indices are preferred. 

The invention will now be illustrated by the following non-limiting examples: 
EXAMPLE 1 

The following methods were used in the investigation described in the example: 
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Methods 

Cloning, Protein expression and Purification 

The Cdc4 fragment employed for crystalization, which is deleted for terminal residues 1 to 262 and 745 to 
779, extends from the beginning of the F-box domain to the end of the WD40 repeat domain. The N-terminal deletion 
5 removes a poorly conserved sequence of 226 amino acids and a conserved element of approximately 40 residues 
termed the D-domain that immediately precedes the fbox domain and that has been implicated in molecular 
multimerization. The C terminal deletion removes residues not conserved amongst different Cdc4 homologues. Both 
Skpl and Cdc4 were engineered to remove flexible loops, namely residues 36-55 in Skpl and residues 601 to 604 and 
609 to 624 in Cdc4. 

10 A PCR product containing CDC4(263-744) was cloned into the Ehel(SfoI) and BamHl sites of pPROEX 

HTb. In parallel, a PCR product containing SKP1A37-64 was cloned into the Ndel and BamHI sites of pGEX2T- 
TEV. An Sspl GST-SKP1 -containing fragment from this construct was cloned into the StuI site of the Cdc4 construct 
described above such that CDC4 and SKP1 were in opposite orientations. A non-homologous region in CDC4 
encoding amino acids 602-624 was then replaced by the DNA sequence GGCGAACTG [SEQ ID NO. 39], which 

15 encodes the shorter peptide sequence Gly-Glu-Leu. 

The Cdc4/skpl complex was expressed in E. coli B934 (DE3) cells grown in minimal media suplemented 
with a mixture of selenomethionine (40 ug/ml) and methionine (0.4ug/ml). Cells were induced with 0.2 mM 
isopropyl-P -D-thiogalactopyranoside (IPTG) at 15° C overnight. Cell pellets were resuspended in 50 mM hepes pH 
7.5, 500 mM NaCI , 10% glycerol, and 5 mM Imidazole, lysed with a cell homogenizer (Emulsiflex C-5, Avistin) 

20 followed by a 20 sec sonication (vibra cell, Betatec). The lysate was then clarified by centrifugation at 65 000 x g for 
40 min. The supernatant was loaded onto a 5 ml metal chelating column (Pharmacia) and eluted in high imidazole. 
This fraction was loaded onto a glutathione-sepharose column (Pharmacia) and the bound complex was eluted by 
overnight digestion with TEV protease (Canadian Life). Eluted protein was dialysed to remove DTT and EDTA and 
reloaded onto a metal chelating column. The flow through containing the complex was concentrated and applied to a 

25 Superdex S 75 gel filtration column (Pharmacia). Fractions containing the complex were concentrated in a buffer 
containing 10 mM hepes pH 7.5, 250 mM NaCI, and 1 m M DTT. 
Crystallization, Data Collection, and Structure Determination 

Hanging drops containing 1 ul of 21 mg/ml protein plus 1.2 molar equivilents of the CPD peptide sequence 
were mixed with equal volumes of reservoir buffer containing 0.1 M Tris pH 8.5, and 1.5 M ammonium sulphate. 

30 Crystals were flash frozen in reservoir buffer supplemented with 15% glycerol. Crystals of the space group P3 2 , (a 
=107.7A, b = 107.7A, c = 168.3A, a = y = 90°, p =120°), with two molecules of the complex in the asymmetric unit 
were obtained at 20°C. A Multiple Anomalous Dispersion (MAD) experiment was performed on a frozen crystal at 
the Advanced Photon Source (Argonne, IL) (APS) beamline BM 14-B and BM 14-D(X1 = 0.9798 A, 12 = 0.9800 A, 
13 = 0.9000 A) using a Quantum 4 ADSC CCD detector. Data processing and reduction was carried out with the 

35 HKL program suite (Otwinowski and Minor, 1997). The programs SHARP (de La Fortelle and Bricogne, 1997) and 
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SnB (Miller et ah, 1994) were used in combination to locate and refine 19 of the 22 Se sites. Following density 
modification with Solomon (Abrahams and Leslie, 1996), a partial model was generated using O (Jones et ah, 1991) 
and refined using CNS (Brunger et a!., 1998) to a working R value of 24.09% and a free R value of 28.71%. Pertinent 
statistics for data collection and refinement are shown in Table 1 . 
5 The increased order of the second CPDs may be due to a crystal packing interaction involving the c-terminus 

of the CPD. While the main chain termini of the second CPD are discernable (Figure 3e), the precise backbone and 
side chain conformations for the P-2 Leu, P-3 Gly, P+4 Ser, and P+5 Gly are less reliably determined. 
Mutagenesis 

Point mutants were obtained by a PCR-based approach using oligos provided in supplementary information 
10 and Pfu polymerase (Stratagene). Once verified by sequencing the mutants were sub-cloned into the appropriate 
vectors as listed in the supplementary information. Alanine insertion mutations were obtained using the Kunkel 
method (ref) and then sub-cloned into the vectors indicated in the supplementary information. 
Shuffle experiments 

All mutants on a TRP1 ARS CEN plasmid were transformed into a cdc4A strain (MT 1259) containing a 
15 wildtype copy of CDC4 on a URA3 ARS CEN plasmid. Cells were plated on either Trp'Ura" or 5-FOA medium for 
2 days at 30° C. Viable cells on 5-FOA were grown in Tip' medium and transformed with either wild type GAL1- 
SIC1, GAL1-SIC T45A, or GAL1-SIC T33V on a LEU2 ARS CEN pasmid . Cells were then plated on Leu- Trp- 
plates containing either glucose or galactose and incubated for 2 days at 30° C. 
Sicl-Cdc4 interactions. 

20 Bacterially expressed His6-Sicl was phosphorylated with Cln2-Cdc28 kinase purified from baculovirus 

infected Sf9 cells as described before (Nature paper), lug of WT or mutant Cdc4-GST-Skpl, immoblized on GSH- 
Sepharose resin, was incubated with 0.5ug phospho-Sicl at 4C for lh and washed 4 times. Captured complexes were 
resolved on SDS-PAGE and Sicl visualized by anti-Sicl Western blotting and ECL. For IEF-2D analysis, several 
Sicl phosphorylation reactions were carried out for different time periods to obtain a spectrum of Sicl that were 

25 phosphorylated at different numbers of its nine CDK sites. This pool of phospho-Sicl (2.5ug) was incubated with 5^g 
of WT or mutant Cdc4-GST-Skpl as described above. Different phosphorylation states of Sicl were separated by 
denaturing isoelectric focusing (IEF)-2D gel electrophoresis and visualized by anti-Sicl Western blotting and ECL. 
IEF was performed using pH3-10NL Immobiline gel strips and IPGphore IEF system (Amersham pharmacia). 
Results 

30 The x-ray crystal structure presented herein consists of a ternary complex of yeast Skpl bound to a fragment 

of Cdc4, and a 9mer high affinity CPD phosphopeptide (Figure 2). The Cdc4 fragment, which is deleted for terminal 
residues 1 to 262 and 745 to 779, extends from the beginning of the F-box domain to the end of the WD40 repeat 
domain. 

Skpl- Cdc4 Fbox: Skpl forms an elongated structure with a mixed a/p topology identical to that reported for human 
35 Skpl(Schulman et al, 2000). The topology consists of a three-strand (denoted pi to P3) P-sheet and eight ot-helices, 
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denoted <xl to ot8 (Figure 2a). The structure of Cdc4 from its amino terminus consists of an F-box domain, an oc- 
helical extension or linker, and a WD40 repeat domain (Figure 2a,b). The F-box domain comprises five a helices 
(denoted aO to a4)~This topology differs slightly from that reported for the F-box domain of hSkp2 (Schulman et al, 
2000), which consists of a loop region LI and three helices denoted cti to ct3. Helix aO in Cdc4 corresponds most 
5 closely in sequence and position to the loop region LI of Skp2 while a half turn remnant of helix a4 is discernable in 
the transition sequence between the Skp2 F-box and Leucine Repeat domains. As observed in the Skpl-Skp2 
complex, Skpl and the F-box domain of Cdc4 associate by the interdigiation of helixes aO to ajof Cdc4 with helices 
a5 to a8 of Skpl. This mode of inter-domain association is characterized by a common and continuous hydrophobic 
core that spans the two protein domains. 
10 Cdc4 helical linker and WD40 domain: Following the F-box domain of Cdc4 is a helical extension that forms a 
structured bridge to the WD40 repeat domain. The helical extension consists of two a-helices a5 and ct6 that together 
with helices a3 and cx4 of the F-box domain form a stalk and pedestal like structure that connects and orients the 
WD40 domain (Figure 2c). 

Eight copies of the WD40 repeat motif in Cdc4 form an 8 blade P-propeller structure. Each blade, composed 

15 of 4 anti-parallel P-strands, is related by 8-fold pseudo symmetry about a central axis (Figure 2b). As first shown for 
G-protein gamma subunit (Sondek 1996), the WD40 repeat motif of approximately 40 amino acids composes the 
outer P-strand of one propeller blade and the inner three strands of the adjacent blade. A continuous circular 
arrangement of blades is formed by the association of the first and last WD40 repeat motifs to form the 8 th propeller 
blade. Interestingly, a 7 P-propeller blade structure was anticipated for Cdc4 and its orthologues (and generally all 

20 WD40 repeat F-box adaptors), which is attributable in part to the cryptic nature of the 8th WD40 repeat motif (Figure 
1). Based on the structure based sequence alignment in Figure 1, it is predicted that the other WD40 class of F-box 
adaptor proteins (i.e. the Met30 orthologues and PTRCP orthologues) will form 7-blade P-propeller structures. 

The WD40 repeat domain forms a disk like structure characterized by a cavity in the middle and two 
opposing circular surfaces of slightly different size. The smaller of the two surfaces composes the CPD binding site. 

25 On the bottom surface is anchored helix a6 of the helical extension, which inserts obliquely between propeller blades 
P7 and P8. Interestingly, P-propeller blade 2 consists of 5 P-strands. The outermost strand of this blade, denoted P9 1 , 
is non-standard and arises from an amino acid insert in the connecting loop between p-strands 12 and 13. Strand P9 1 
forms a parallel arrangement with strand P9, which differs from the anti parallel architecture of all other P-strand 
elements in the WD40 domain structure. A large insert in the pi2-pl3 linker is absent from dr, ce, hu, mu Cdc4 

30 homologues suggesting that a 5 pstrand propeller blade 2 is unique to the fungal homologues. 

A fixed orientation between the F-box domain and WD40 domain of Cdc4 is maintained largely through the 
integrity of the stalk like helix a6 of the helical extension (Figure 2c). Helix a6 is 30A in length, and is anchored at 
its N-terminus to the hydrophobic core of the F-box/helical extension and at its C-terminus to the hydrophobic core of 
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the WD40 repeat domain. In contrast to the intermolecular connection between Skpl and the F-box domain, the 
connection between the F-box domain and WD40 repeat domain appears less rigidly structured. 

At its amino terminus, helix a6 anchors to the F-box through hydrophobic interactions involving a6 residues 
Phe 355 and Leu 356 and F-box residues He 295, and He 296, Leu 315, Trp 316, and Leu 319 (Figure 2c). Helix a5 
5 packs along side the base of helix ct6 opposite to the F-box domain through hydrophobic packing interactions 
involving Tyr342, Leu 338 and Leu 334. At its C- terminus, helix oc6 anchors through hydrophobic interactions 
involving residues Trp 365 and He 364 with WD repeat residues Val 687, lie 696, Leu 726 and Phe 743 in p~-propeller 
blades 7 and 8. Asn 364 of helix a6 also forms a tight hydrogen bond interaction with the backbone carbonyl group 
of Phe 743 in propeller blade 8. The noted interactions (with the exception of interactions involving helix a5) involve 

10 residues that are conserved across most WD40 F-box adaptor proteins including the Met30 orthologues and P-TRCP 
orthologues, which suggests that the linkage between WD40 and F-box domains are similarly structured in these 
proteins. Helix a6 in p-TRCP, however, appears to be one ot-helical turn longer (Figure 1). 

Outside of stalk helix a6, only two close contacts (< 3. 5 A) are observed between the WD40 repeat domain 
and other regions of Cdc4. These contacts consist of hydrogen bonds between Asn684 and Arg700 in the loop regions 

15 of propeller blade 7 with Glu 323 in the ct4-a5 linker of the helical extension. Both, hydrogen bonds are maintained in 
the two Cdc4 molecules of the crystal asymmetric unit but all three residues are poorly conserved amongst Cdc4 
orthologues (Figure 1). The lack of additional stabilizing interactions suggests that the F-box/WD40 domain linkage 
is not exceedingly rigid, and indeed, the WD40 domain in the two molecules of the asymmetric unit differ relative to 
their F-box domains by a 5 degree rotation about helix ot6. 

20 WD40 domain phosphopeptide recognition: A nine-mer CPD consisting of the sequence acetyl- 
Gly,Leu,Leu,pThr,Pro,Pro,Gln,Ser,Gly-amide [SEQ ID NO.40] is bound to the front face of the WD40 domain of 
Cdc4. In the two WD40 repeat domain/CPD complexes of the crystal asymmetric unit, a central core of 4 CPD 
residues corresponding to the sequence Leu, pThr, Pro, Pro [SEQ ID NO.41] is well ordered. 

These residues have been modeled unambiguously in unbiased experimental electron density maps (Figure 

25 3e). Interpretable electron density is also apparent for the P-2 Leu, P-3 Gly, P+3 Gin, P+4 Ser, and P+5Gly positions 
of the second CPD (no interpretable electron density is apparent for these residues in the first CPD). The CPD binds 
in an extended manner across P-propeller blade 2 with the N-terminus oriented towards the central cavity of the 
WD40 repeat domain and the C-tenriinus oriented towards the outer rim. The CPD binding surface of Cdc4 is 
composed of invariant and highly conserved residues from P-propeller blades 1 to 6 and 8 and represents the most 

30 conserved part of the WD40 repeat domain surface (Figure 3a,c). 

Cdc4 displays an absolute requirement for phosphorylation at Ser or Thr at the P-0 position of the CPD. In 
the crystal structure, the P0 pThr phosphate group is coordinated by an intricate network of electrostatic interactions 
and hydrogen bonds involving residues absolutely conserved across all Cdc4 orthologues (Figure 3c). The P0 
phosphate group forms direct electrostatic interactions with the guanidinium groups of Arg 485, Arg 467, and Arg 534 
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and a direct hydrogen bond with the side chain of Tyr 548. The side chain of Tyr 548 is coordinated by stacking 
interactions with the guanidinium group of Arg 572, which in turn is coordinated by a hydrogen bond to the side chain 
of Tyr 574. Although Cdc4 shows a strong (6 fold) preference for pThr over pSer, the structural basis for this 
selectivity is not obvious. In the crystal structure, the Cy methyl group of Thr is directed towards solvent and does not 
5 make contact with the CPD binding surface of Cdc4. This binding preference may be due to the greater side chain 
rotational stability arising from the Thr p-branch structure. 

Cdc4 displays an absolute requirement for proline in the p+l CPD position. In the crystal structure, the P+l 
proline side chain projects into a three-sided pocket on the CPD binding surface. The side chain of Trp 426 forms one 
side of the pocket and packs in a coplanar manner with the P+l proline side chain. On its other side, the Trp 426 side 

10 chain packs tightly against the side chain of Thr 386. The opposite side of the P+l binding pocket is formed by the 
side chain of Arg 485. Arg 485 coordinates the P+l Proline through van der Wals side chain interactions and through 
a direct hydrogen bond to the Proline backbone carbonyl group. This represents the sole direct hydrogen bond 
interaction between Cdc4 and the CPD main chain. The side chains of Thr 441 and Thr 465 define the remaining side 
of the P+l Proline binding pocket, with the Cy side chain groups composing a hydrophobic surface. The hydroxyl 

15 groups of Thr 441 and 465 orient away from the P+l binding pocket, where they are well placed to influence binding 
specificity for CPD residues C-terminal to the P+l position. Unlike Trp 426, Thr 386 and Arg 485, which are 
invariant amongst the Cdc4 orthologues, Thr 441 and Thr 465 are substituted with He in the S. pombe Cdc4 
orthologue Popl. The modeling studies suggest that this substitution has no effect on the P+l binding pocket but may 
perturb CPD binding specificity C-terminal to the P+l positions through steric effects (He is bigger than Thr) and by 

20 increasing the hydrophobic character of the surface. 

Cdc4 displays a strong preference for the hydrophobic residues Leu, He and Proline at the P-l and P-2 CPD 
positions. In the crystal structure, the P-l Leucine side-chain is oriented towards a hydrophobic pocket composed of 
invariant residues Trp 426, Trp 717, and Thr 386, and the conserved hydrophobic residue Val 384. While less 
precisely modeled, the main chain position of Leu +2 lies in close proximity to a third hydrophobic pocket composed 

25 of the invariant residue Tyr574, and the conserved hydrophobic residues Met 590 and Leu634. 

Cdc4 displays little preference for residues in the P+2 to P+5 CPD positions. In the crystal structure, the side 
chain of P+2 Pro is directed towards solvent (which would account for the lack of selectivity at this position), while 
the main chain conformation of Pro +1 and Pro+2, causes the CPD to kink away from the peptide-binding surface 
from the Pro+2 position onwards. As a result, only one additional close contact with Cdc4 is made by the CPD 

30 following the Pro +1 position, which consists of a weak hydrogen bond (sub-optimal geometry) between the P+4 Gin 
side chain and the side chain of Arg 485. 

Adjacent to the P+l proline binding pocket, Ser 464, Thr 441 and Thr 465 are well placed to exert specificity 
for the +3 and +4 CPD positions if an extended rather than kinked conformation of the CPD were adopted. As noted, 
Thr 441 and 465 are substituted with He in the Cdc4 orthologue popl in S. pombe. While nothing is known about the 
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^ of this substitution ou CPD recognition, it is predicted that this could have some effect on substrate selectivity 

for the P+2 to P+5 CPD positions. 

Cdc4 displays strong selectivity against Arginfac and Lysine in positions -2, -1, + 2, + 3 and Th* 
selectivity may be due to electrostatic repulsion generated by the invariant Cdc4 residues Argininc 572, 534, 467, 485 
and 443 which dominate me local electrostatic character of me CPD binding site. Lys 402 is also well placed to 
contribu* to repulsive effects but this position is not conserved amongst the Cdc4 orthoses. The selectivity 
again* positively charged residues in the P-2 to P-l CPD positions «n also be recouped in part by the hydrophobe 
nature of me P-l and P-2 binding pockets and indeed, oppositely charged Glu and Asp residues are also disfavored at 

these CPD positions. 

Comparison with Skpl-Skp2 complex: 

Skp2 is a representative member of a second class of F-box adaptor proteins, which possesses a leucine 
repeat domain fa place of the WD40 repeat of Cdc4. In addition to providing a first structural view of a Skpl 
homologue and an F-box domain, the structure of the Skpl/Skp2 complex revealed a mode of molecular assoaatxon 
predicted to be employed by all Skpl/F-bo* homologue, The Cdc4/Skpl/CPD structure confirms the fold of the 
individual Skpl and F-box domains and their mode of association. Superposition of yeast and human Skpl strands 
PH3 and helices «1 to <x7 (RMSD Ca = 0.74A ) reveals a close correspondence between F-box helixes «1 to «3 
wjm only Skpl helix «8 and F-box helix o4 showing significant deviations between the two structures. In addition, 
only the first half of helix <x8 is ordered fa ySkpl and only ha half torn fragment of the F-box helix o4 is apparent m 
Skp2. The differences fa positions and lengms of F-box helices «4 and Skpl helices *8 reflects the different roles 
these secondary structure elements play fa the linkage between their respective F-box and ligand binding domains. 

The structure of the Skpl/Skp2 complex revealed a solid/substantial linkage between its Leucine Repeat and 
te F-box domains, a feature predicted to be shared by all Skp2 F-box orthologues. In Skp2, the F*ox domain helix 
a4 terminates abruptly and whbout an appreciable linker, makes an immediate transition to the Leu Repeat domain 
fold. This linkage is enhanced by a fi-strand projecting back from C-termfaus of the Uu repeat domain and helix a8 
projecting forward from Skpl. The sum of linker region mteractions compose a local hydrophobic network mat links 
the hydrophobic cores of the F-box domain with that of the LRR domain. This contrasts sharply with the 
corresponding linkage of Cdc4. which is composes primarily by a lengthy fater-domafa linker (the helical extension) 
and which lacks significant involvement of Skp 1 or die WD40 repeat domain for stabilization. 

Although me Skp2 and Cdc4 F-box adaptor proteins employ structurally divergent ligand binding domains, 
the general position of the WD40 and LLR domains are surprisingly similar. The precise ligand-bfading site on Skp2 
has not been determined but mutagenesis studies on the Skp2 orthologue fa Mei30 have mapped ft. ligand binding 
site to me inner side of the curved surface. If the Skp2 binding site is inferred from the overlap with me Cdc4 CPD 
binding site, the CPD site would map to the lateral side of the Leu repeat domain. 
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Mutational analysis of CPD binding surface 

In order to probe the functional importance of amino acid residues on the hi S hl y conserved peptxde bmdmg 
surface a panel of Cdc4 mutants (both single and double mutants) were generated and tested each for its abthty to 

a cell viability assay. Of 12 single site mutants tested, only Arg 467. Arg485Ala Arg534Ala, and Tip 426 abolished 
bomceUviabmtymvivoandpbosphoSielbindmginvitro. Together, these residues compose most of the «*on 
5 surface with me pTbr. Pro CPD core. lastingly, Tyr 548, the only other amino acid o^the surface of CDC4 to 
cirecdy contact PO phosphate group, is functional in vivo but is compromised for CPD bindmg in vitro. Mutatton of 
A. adjacent residue Arg572 to Ala shot** the same behavior. For fte Arg572 nation, the inabHty to bmd pskI - 
vitro appears due to its tendency to aggregation Presumably in the context of the full » * complex in vivo tins 
mutant is sufficiently well behaved to bind phospbo Sicl . 
,0 All other single site mutants including Arg443Ala, Lys402Ala, Tyr574Ala, Trp7l7, Val384, 

and the double site mutant Ihr441/465Ce, K404D/R443D and V384N/W717N are viable «1 en expressed m the cdc4 
delete and are folly competent for phosphoSicl binding in vitro. 

Since the cell viability assay may be masking subtle functional roles for the conserved Cdc4 restdues, 
function was assayed in vivo uudci more etrmgeut condition, in w*ieh Sclwt or the stabil: red mutants, SiclCT33V) 
25 or Sicl(T45A) are overdressed under a galactose promoter. This should amplify detects in cdc4 funcnon. Under 
these conditions, Trp 717, Tyr 548 and the double mutant K404D/R443D are lethal showing mat these residues are m 
tact important for function. 
Role of the Stem and pedestal structure 

To probe the role of the F-box WD40 inter-domain linker, point mutations, insertions or deletions were 

30 mtroducedmtomestemandpedestalstmctureo^ 
binding site mutants. 

Deletion of helix o5 or introduction of Proline and Glycine helix destabilizing residues within the helix had 
no effect nCdc4 function both in vitro and in vivo. This result is consistent with the poorly conserved nature f 
helix a5 and its flanking linker regions. Helix 5 appears entirely absent from human, mouse and drosophila 
35 homologues and helix destabilizing substitutions in helix 5, incorporating glycine and proline, arc observed in the 
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* ft «1 homoloaues (Figure 1). A more invasive deletion of helix 5 that delete* part of the linkers to helix 4 
„. and phospho Sic 1 in Thb »tadon « «-»> * ° f tete 6 ^ K ^ 

„ * of Cdc4 lo to bind pSicl and Skpl. These results are consistent with o possdsle «. for hah* 6 
TOSemki bound substrates in a speeffie geometric orientaSoiL 

t , p ^ to ^.apta*^i»^»^^^ mtoC T° f ' te 
, diraer addidooa! contacts befp . stabilize posidon of the WD repeals widi respect to the other p., of to pntteoa 

for the catalytic mechanism. 

ProbioBaobsrfotostlectWfjiitaiiist positively charged residues 

Cdc4 bind Sicl k a nralti she dep»de». — Each of dte phosphor,*** alter m M _ auW— 
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was observed for a double mutant 
25 Cancer causing mutations in drosophila and huimnCdc4 

llJL in human and fly eclogues of yCdc4 give rise to cancers (see Table below). M nus-sense 
mutations map to the WD40 CPD binding domain and either have been demonstrated or are predicted to ^ >e ™^ > 
binding function. In previous studies, two cancer cell lines tested positive for nations ar Arginine 534 and Axg 4*7 
(Ar B 534andAtg467inyCdc4). ^c^mc^^^^^^*^^^^ 

30 phosphogroupandourmut^ 

I ZL study, two endometrial cancerous tissue samples tested positive for mutations equivalent to Arg467 and 
Arg 485 in yCdc4 As f r the tumor cell line mutations, these mutations affect key resxdues required for CPD 



recogniri n 

35 



Two mutations characterised in drosophfla cancer include AlalUnVal and Glyll32Glu, concspondmg to 
yCdc4 positions Ser532 and Gly546 respectively. The first of these mutations, involve ftc snbstinm n of a small 
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Ala/Ser residue with a bulkier b-branched Valine residue. This may compromise CPD binding function through steric 
effects on the position of Arg434, Arg467, Arg534 triad. In the crystal structure, Ala/Ser is positioned centrally 
amongst the triad. The second drosophila mutation, Glyll32Glu, maps to p-strand 15 of propeller blade 4 in yCdc4. 
This position is within the core of the protein and mutation here likely acts by disrupting the overall WD40 domain 
5 fold or through local perturbations of structure that indirectly affect the phosphate binding pocket. Glycine in this 
position of the WD40 repeat motif is highly conserved. The temperature sensitive alleles previously characterized 
including Gly398Glu in propeller blade 1 and Ser438Asn in propeller blade 2 likely act by disrupting the fold in a 
similar manner to disrupt the overall WD fold. These are more distantly located from the CPD binding pocket. 
Cancer Mutations 

10 



H-cell lines 


Drosophila 


Endometrial 


Orlicky 


Rosamond 


Arg534(425)Leu 
Arg467(385)Cys 


Ser/Ala532(1118)Val 
Gly546(1132)Glu 


Arg467(465)His 
Arg485(479)Gln 


Arg534Ala 
Arg467Ala 
Arg485Ala 
Trp426Ala 


Gly398Gln 
Ser438Asn 



Discussion 

Recognition of phosphorylated substrates by the ubiquitin system. 

Substrate selection by Cdc4. The structure of the Skpl-Cdc4-CPD complex reveals the basis for phosphorylation- 

15 dependent recognition, the specificity of which is governed by three primary determinants. The substrate phospho- 
threonine is locked in place by direct contacts with three conserved and essential Arg residues. The preference for 
hydrophobic residues at the P-l position (and perhaps P-2 position) is enforced by a hydrophobic pocket that lines the 
center of the WD40 propeller. Finally, the bias against basic residues at P+2 to P+5 is established by two conserved 
Arg residues positioned on the top of the propeller directly in-line with the axis of the bound peptide. These 

20 conclusions are supported by mutagenesis of key residues in Cdc4 and by structure-based engineering of Cdc4 to 
accept sub-optimal CPD sequences. 

The construction of the Cdc4 phospho-peptide binding module differs from that of known phospho-Ser/Thr 
binding modules in an important respect. Known phospho-recognition domains, such as 14-3-3, WW and FHA 
domains appear to be composed of a series of dedicated interaction sites, each of which contributes incrementally to 

25 the overall binding interaction (Yaffe and Elia, 2001). The Cdc4-substrate interaction is dominated by extensively 
coordinated phospho-Thr and Pro residues, as well as by a striking positive electrostatic potential around the binding 
site. The hydrophobic pocket that selects residues in the P-2 and P-l positions also contributes to binding affinity. In 
contrast to other phospho-recognition modules, however, the strong binding of the phosphorylated residue is partially 
offset by specific selection against basic residues in the substrate peptide, through electrostatic repulsion from a basic 

30 patch downstream of the phosphate binding pocket. These features allow the binding affinity for any given peptide to 
be precisely tuned. Thus, all of the natural CPD motifs in Sicl are sub-optimal in one or more respects; indeed only 
peptides derived from the T45 site exhibit any detectable interaction with Cdc4 (Nash et al., 2001). These features 
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establish a requirement for substrate phosphorylation on multiple sites, which mediate a high affinity interaction in a 
manner that depends cooperatively on the number of phosphorylated residues. 

In the case of wild type Sicl, at least six sites must be phosphorylated for high affinity binding by Cdc4. As 
shown here, mutation of the basic selection residues shifts the binding equilibrium to lower phosphorylated forms 
5 while in previous studies, it was demonstrated that introduction of a single optimal CPD into Sicl causes premature 
Sicl degradation and genome stability (Nash et al., 2001). An advantage of this system is that not only can the affinity 
of individual sites be tuned over a broad range, but the number and spacing of sites can be readily varied to establish a 
threshold for the targeting kinase. Thus Cdc4 is able to target numerous critical factors for phosphorylation-dependent 
degradation, including the Cdk inhibitor Sicl, the polarization factor Farl, the replication initiator Cdc6 and the 

10 transcription factor Gcn4, all of which may be controlled with different kinetics and different phosphorylation 
thresholds (Deshaies, 1999). These properties distinguish Cdc4 from other known phospho-peptide binding modules 
that typically interact with dedicated sites on their substrates through a single high affinity interaction (Pawson and 
Nash, 2000; Yaffe and Elia, 2001). 

The mechanism that engenders a cooperative binding effect remains to be determined. In principle, multiple 

15 interactions sites might increase binding either by engaging more than one binding site on Cdc4, or by decreasing the 
probability of dissociation from Cdc4 (Deshaies and Ferrell, 2001; Harper, 2002; Nash et a!., 2001). Cooperative 
interactions for the dual SH2 domain phosphatase SH-PTP2 and 14-3-3£ rely on two substrate binding sites for high 
affinity recognition of bivalent ligands (Eck et al., 1996; Yaffe et al., 1997). Notably though, inspection of the WD40 
surface does not reveal any other potential ligand binding pockets or grooves that might accommodate a 

20 phosphorylated peptide motif. Although secondary weak phospho-dependent interactions might occur, it is not 
obvious from the structure where such putative secondary sites might be located. In favor of the probabilistic 
cooperativity effect, mathematical modeling suggests that cooperative behaviour arises for the interaction between a 
single binding site and a polyvalent ligand as a function of the number of ligand sites. In effect multiple ligand sites 
increase the local concentration of ligand beyond a diffusion limited threshold for escape from the receptor. In the 

25 absence of candidate secondary sites, the simplest model is favored in which Cdc4 contains only a single phospho- 
dependent binding site. 

Comparison to other phospho-peptide binding domains. The structure of the Cdc4 WD40 domain provides direct 
evidence that WD40-type repeats can assemble into propellers with more than seven blades (Fulop and Jones, 1999). 
One consequence of the additional blade is an enlarged channel through the center of the propeller, which creates a 

30 wide binding pocket that accommodates the core Leu-pThr-Pro ligand. This pocket contrasts to all other phospho- 
Ser/Thr binding domains, which engage their ligand through more shallow surface contacts within loops that extend 
from the core domain. WD40 domains are known to interact with other proteins in at least two different modes. In the 
Gb transducin and TUP1 WD40 domains, the protein interaction region occurs across the top of the propeller, much as 
in the case of Cdc4 (Sprague et al., 2000; Wall et al., 1995). In a second mode, defined for the WD40 domain of 

35 clathrin and the b-arrestin peptide, a "peptide-in-groove" interaction occurs on the bottom edge of the propeller 
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between the b-strands of the second blade (ter Haar et al., 2000). Modeling of b-TrCP, which binds the consensus 
motif DpSGXXpS [SEQ ID N0.42] in IkBa, b-catenin, and Vpu (Yaffe and Elia, 2001), suggests that an extensive 
basic region on the top of the propeller will engage substrate peptides in an analogous manner to Cdc4 . 
Spatial orientation of SCF substrates. A conserved feature between all E3 structures solved to date is the large 
5 distance between the substrate binding site and the catalytic site (Huang et al., 1999; Zheng et al., 2002; Zheng et aL, 
2000). Modeling of the Skpl-Cdc4 complex onto a model of the Skpl-Cull-Rbxl-E2 complex suggests that the 
substrate is positioned for direct frontal attack by the E2 catalytic site but that a gap of some about 65 A must be 
bridged between the two sites, presumably by the substrate polypeptide. Unexpectedly, superposition of the WD40 
domain of Cdc4 with the LRR of Skp2 does not align the defined phosphopeptide binding pocket of Cdc4 with a 

10 potential phospho-recognition site of on the concave face of the LRR repeats (Zheng et aL, 2002), at least as defined 
by mutational analysis of the related F-box protein Grrl in yeast (Hsiung et aL, 2001). If the relative position of 
substrates in the WD40 versus LRR class of F-box proteins differs, spatial plasticity in substrate presentation must be 
possible. This notion is consistent with the fact that the HIV protein Vpu is able to redirect the specificity of the F-box 
protein b-TrCP by bridging bTrCP to the host cell protein CD4, in a manner that depends on phospho-dependent 

15 recognition of Vpu by b-TrCP (Margottin et al., 1998). Similarly, it is possible to create synthetic adapters that bridge 
the substrate recognition site of an F-box protein to an ectopic substrate (Sakamoto et al., 2001). Finally, by definition 
all E3s must able to accommodate the substrate and the elongating ubiquitin chain generated by repeated catalytic 
cycles (Pickart, 2001). All of these points argue for considerable spatial leeway, and possibly flexibility of F-box 
protein orientations within the SCF catalytic cavity. 

20 Based on the extensive Skpl-Skp2 interface, and on the inactivation of Cull by insertion of a flexible linker, 

it has been proposed that SCF complexes, and perhaps E3 enzymes in general, must present substrates to the catalytic 
site in a rigidly defined fashion (Zheng et al., 2002). However, the WD40 domain and the F-box of Cdc4 are linked 
only by a single a-helical stalk, with additional surface contact between the domains, all of which is mediated by non- 
conserved residues. It is thus somewhat difficult to reconcile the properties of the two F-box protein structures solved 

25 to date. Although it may be that regions truncated from Cdc4 to enable crystallization may normally help stabilize the 
interface, none of these regions are highly conserved between closely related Cdc4 family members. Perturbation of 
the rotational and translational position of the WD40 domain by introduction of additional residues into the stalk 
abrogates function in all cases, except for a long insertion of 12 residues. The fact that this gross structural change can 
be tolerated implies a degree of comformational plasticity with the catalytic cradle. This plasticity may facilitate the 

30 access of multiple ubiquitination sites within Sicl to the catalytic center, as directed by the multiple low affinity CPD 
motifs in Sic 1 . 

Insights into substrate recognition by human Cdc4. In metazoans, Cdc4 targets multiple critical regulators of cell 
division and development. Among these, cyclin E is a crucial substrate because its abundance must be strictly 
controlled in order to avoid precocious S phase entry and attendant genome instability (Spruck et al., 1999). Notably, 
35 it has been recently reported that mutational inactivation of hCDC4 occurs in several cancer cell lines that exhibit high 
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levels of cyclin E (Moberg et al., 2001; Strohmaier et al., 2001). In addition, hCDC4 may be mutated in up to 30% of 
endometrial cancers (Spruck et al., 2002). Quite strikingly, known cancer associated mutations in hCDC4 alter 
phosphoThr-binding residues. Given the probable requirement for homodimerization in active SCF complexes 
(Kominami et al., 1998; Suzuki et al., 2000), such mutations might be expected to acts in a partial dominant negative 
5 manner. Other critical substrates that appear to bind Cdc4 in a phosphorylation dependent manner include SEL-10, a 
negative regulator of the LIN-12/Notch pathway (Hubbard et al., 1997) that targets the transcriptionally active Notch 
intracellular domain for degradation (Gupta-Rossi et al., 2002; Wu et al., 2001) and the presenilins, dominant 
mutations in which predispose to familial early onset Alzheimer's disease (Selkoe, 2001; Wu et al., 1998). Mutations 
that interfere with hCdc4 activity may therefore compound multiple disease phenotypes. 
10 Yeast and human Cdc4 exhibit a high degree of structural similarity, especially in the critical substrate 

binding region, and moreover, Cdc4 family members are functionally conserved since the hCdc4 substrate cyclin E is 
efficiently degraded in yeast in a CDC4-dependent manner (Koepp et al., 2001; Nash et al., 2001; Strohmaier et al, 
2001). The structure of yeast Cdc4 thus affords insights for rational drug design. Significantly, the low affinity of 
individual natural CDP sites that engender the requirement for multisite phosphorylation means that even compounds 
15 of moderate affinity can readily out-compete the binding of fully phosphorylated substrates (Nash et al., 2001). 
Naively, inhibition of hCdc4-substrate interactions would be expected to exacerbate the deregulated proliferation 
caused by stabilization of cyclinE, Notch-IC or presenilis However, if Cdc4 or Cdc4-like activities limiting for 
growth, Cdc4 antagonists may have heightened toxicity in cells that are hypomorphic for Cdc4 function. 
Alternatively, disruption of hCdc4 function may cause synthetic lethal effects in combination with otherwise non- 
20 lethal mutations in functionally overlapping pathways (Tong et al., 2001). 
EXAMPLE 2 

The following methods were used in the investigation described in the example: 
Protein expression and purification. The Cdc4 fragment employed for crystallization was deleted for residues 1- 
262, 602-605, 609-624, and 745-779 to remove loop regions based on sequence alignments and limited proteolysis of 

25 the intact SCF Cdc4 complex. Skpl was deleted for a non-conserved loop insertion spanning residues 37-64. A 
GST Skpl- His6 Cdc4 complex was co-expressed from plasmid pMT3169 in B934 (DE3) bacterial strain (Stratagene) 
cells grown in mimmal media supplemented with a mixture of selenomethionine (40 ug/ml) and methionine 
(0.4ug/ml) and purified by double affinity tag chromatography (Nash et al., 2001). All mutations were constructed by 
standard methods using oligonucleotides listed in Table 7 and sequence verified in their entirety. Mutants were sub- 

30 cloned into pMT3055 or pMT3217 for expression in bacteria or yeast, respectively, as listed in Table 8. The WD40 
domain of the helix ct6 linker mutants Alal, Ala2, Ala 12, and helix a6 breaker could not be stably expressed in 
bacteria; the Ala 12 mutant also could not be expressed in yeast. 

Crystallization, data collection, structure determination and modeling. Hanging drops containing 1 ul of 20 
mg/ml protein and 1.2 molar equivalents of the cyclin E derived CPD peptide (acetyl-Gly-Leu-Leu-pThr-Pro-Pro-Gln- 
35 Ser-Gly-amide) [SEQ ID NO 40]in buffer (10 mM HEPES pH 7.5, 250 mM NaCl, 1 mM DTT) were mixed with 
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equal volume of reservoir buffer (0.1 M Tris pH 8.5, 1.5 M ammonium sulphate). Crystals of the space group P3 2 , (a 
=107.7A, b = 107.7A, c = 168.3A, a = y = 90°, p =120°), with two Cdc4-Skpl-CPD complexes in the asymmetric 
unit were obtained at 20°C. A Multiple Anomalous Dispersion (MAD) experiment was performed on a frozen crystal 
at the Advanced Photon Source (Argonne, IL) beamline BM 14-C and BM 14-D (Jtl = 0.9798 A, X2 = 0.9800 A, A3 = 
5 0.9000 A) using a Quantum 4 ADSC CCD detector. Data processing and reduction were carried out with the HKL 
program suite (Otwinowski and Minor, 1997). The programs SHARP (de La Fortelle and Bricogne, 1997) and SnB 
(Miller et al, 1994) were used in combination to locate and refine 19 of the total 22 selenium sites. Following phasing 
and density modification, a model was built using O (Jones et al., 1991) and refined to 2.7 A resolution with NCS 
restraints using CNS (Brunger et al., 1998) to a working R value of 23.8% and R frcc of 27.3%. Pertinent statistics for data 

10 collection and refinement are shown in Table 2. Amino acids 37-74, and 104-1 15 of Skpl and amino acids 497-507 of 
Cdc4 were disordered and could not be modeled. 89.1% of the residues occupy the most favored regions of the 
Ramachandran plot, 10.8% the additional allowed region and 0.2% the generously allowed region. 

Ribbons representations were generated using Ribbons (Carson, 1991), surface representations were 
generated using Grasp (Nicholls et al., 1991) and electron density maps were generated using O (Jones et al., 1991). A 

15 model of the ubiquitin-E2-SCF Cdc4 -CPD complex was generated by superposition of the Skpl subunits of the Skpl- 
Cdc4-CPD structure and the Skpl-Cull-Rbxl structure (PDB ID 1LDK) (Zheng et al., 2002), the RING finger 
domains from Rbxl in the same Skpl-Cull-Rbxl complex and from the Cbl subunit of the Cbl-UbcH7 structure 
(PDB ID 1FBV) (Zheng et al., 2000), and the E2 subunits of the Cbl-UbcH7 structure and an NMR-based Ubcl- 
ubiquitin model (PDB ID 1FXT) (Hamilton et al., 2001). The Skpl, RING domain and E2 subunits overlapped with 

20 RMSD values of 1.01 A, 2.09 A, and 2.04 A respectively. 

Cdc4 functional assays. CDC4 mutant alleles were assessed for complementation of a cdc4A strain in a plasmid 
shuffle assay (Nash et al., 2001). Sensitivity to SIC1 dosage was determined by transformation with pMT837 (GAL1- 
SIC1) or pMT767 (GALJ-SIC1 T33V ) and plating on glucose medium or galactose medium. For in vitro capture of 
phospho-Sicl by Cdc4, 0.5 ^g of bacterially-expressed HIS6 Sicl was phosphorylated with immobilized Cln2-Cdc28 

25 kinase from baculovirus-infected Sf9 cells and then incubated with 1 jig of immobilized wild type or mutant Cdc4 263 " 
744 -GST-Skpl, at 4°C for lhr, washed 4 times and visualized by anti-Sicl immunoblot. For isoelectric focusing 
(IEF)-2D gel analyses, an evenly distributed pool of phospho-Sicl isoforms was generated by combining different 
time points in a Sicl phosphorylation reaction. 2.5 ^ig of the phospho-Sicl pool was bound to 5 |ig of immobilized 
wild type or mutant Cdc4 1 744 -GST-Skpl. Captured isoforms were separated by denaturing IEF-2D gel 

30 electrophoresis using pH3-10NL Immobiline gel strips (Amersham) and visualized by anti-Sicl immunoblot. 
Alternatively, the pool of phospho-Sicl isoforms was incubated in solution with a ubiquitination reaction mix 
containing ATP, ubiquitin, yeast El, Cdc34 and either wild type or mutant SCF Cdc4 complex, composed of a 1:1 ratio 
of bacterial Cdc4-GST-Skp 1 and insect cell-produced Cdc53-Rbxl, at 30°C for lh as previously described (Nash et 
al., 2001). 
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^ip* weie used lo deduce loop k S^t- - *» *" 

, ^uopuoul - ohtauKd that dfttacttd » a of 2.7A (Table 2). Fox •* . ™- 
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WD40 Ijinain containing F-box protein family members (Wolf ct al, 1999). The high affinity CPD ^^^| >C ^°^ 
,0 corresponds » nine resid.es of human cydinE, al y -Uu.Uu« P ^n^, y , (SBQ ID NO. 40, ufcch 
binds Cdc4 with a of 1 [iM (Nash etal., 2001). 

The F-box interface. Yeast Skpl forms an elongated structure wrth a mixed «0 topology identical to that reported 
for human Skpl (Schiihnau et al.. 2000) and consists of a three-strand B sheet, denoted BI to P 3, and eight tx-hebces 
denoted al to a8 (Figure 2A) . The structure of Cdc4 consists of an F-box domain, an a-hehcal linker, and a WD40 

slightly from that reported for the F-box domain of hSkp2 , which consists of a loop region LI and three hehces 
denoted al to a3 (Schulman et al., 2000) Helix «0 in Cdc4 corresponds most closely in sequence and positron to the 
loop region LI of hSkp2 whUe a turn rernnaut of C^4 h^ a4 is discemable in the transition sequence between 
the hSkp2 F-box and the LRR domain. As observed in the Upl** complex, ScSkpl and the F-box domain of 
Cdc4 associate by migration of helixes a0 tp a3 Cdc4 whh belies a5 to a8 of Skpl, with the interface itself 
comprised of an inter-protein 4-helix bundle. This mode of association gives rise to a contiguous hydrophobic core 
that spans Skpl and the F-box domain of Cdc4. Superposition of the yeast and human structures reveals that Skpl 

half turn fragment of the F-box helix cc4 is apparent in hSkp2 (Figure 6A). The difference in position and length of F- 
box helix a4 and Skpl helix a8 reflects the different roles these secondary structure elements play in me linkage 
between their respective F-box and ligand binding domains, as described below. 

The WD40 domain. Eight copies of the WD40 repeat motif in Cdc4 form an 8 blade P-propeller structure (Figure 
6B) The WD40 repeat motif of approximately 40 residues composes the outer p-strand of one propeller blade and the 
inner toe strands of the adjacent blade in a continuous circular arrangement (Fulop and Jones, 1999). The actual 
30 Cdc4 structure contrast xo the 7 blade (H>rope)ler predicted for Cdc4 and its orthologs based on previously solved 
WWOoonuinstructn™,^ 

is attributable to the cryptic nature of the 8th WD40 repeat motif. Structure baaed sequence alignment suggests that 
the WD40 domains of the F-box proteins Met30 and P-TRCP will form canonical 7-blade p-propeller structures 
(Figure IB) A variant five P-strand structure occurs in blade 2, in which a large insert in the P 12-pi3 linker allows 



20 



25 



66 

the outermost P9 l strand to run parallel to the P9 strand. This five strand composition is unique to the fungal Cdc4 
orthologs. In terms of overall structural dimensions, the WD40 domain resembles a conical frustum of 40A diameter 
top surface and 50A bottom surface, an overall thickness of 30A and a central pore of 6A diameter. The CPD binding 
site resides on the top surface of the frustum and runs across the edge of the pore, while the bottom surface of the 
5 frustum links to the F-box domain. 

The F-box to WD40 domain linker. The F-box domain of Cdc4 is followed by a helical extension that forms a 
structured bridge to the WD40 domain. The bridge consists of two a-helices, a5 and cc6, that together with helices a3 
and cc4 of the F-box domain form a platform and stalk-like structure that positions the WD40 domain well away from 
the F-box domain (Figure 2A,C). The relative orientation of the F-box domain and WD40 domain is imposed almost 

10 entirely through the integrity of the stalk-like helix ct6 } which is 30A in length. The N-terminal end of helix a6 is 
anchored into the hydrophobic core of the F-box domain through interactions involving ot6 residues Phe 355 and Leu 
356 and F-box residues He 295, and He 296, Leu 315, Trp 316, and Leu 319 (Figure 2C). Helix a5 packs along side 
the base of helix a6 opposite to the F-box domain through hydrophobic interactions involving Tyr342, Leu 338 and 
Leu 334. The C-terminal end of helix ct6 inserts obliquely between propeller blades a7 and a8 of the WD40 domain 

15 through van der Wals and hydrophobic interactions involving residues Trp 365 and He 361 with WD40 domain 
residues Val 687, He 696, Leu 726 and Phe 743 in Ppropeller blades 7 and 8. Asn 364 of helix oc6 also forms a tight 
hydrogen bond with the backbone carbonyl group of Phe 743 in propeller blade 8. The conservation of many of these 
residues, with the possible exception of those within helix ot5, suggests that a structured linkage between the WD40 
and F-box domains may be a common feature of the WD40 family F-box proteins. 

20 The interdomain connection between the F-box and the WD40 domains of Cdc4 appears less rigidly 

structured than the corresponding region in hSkp2 (Figure 6A). Outside of the stalk helix a6, only two close contacts 
(< 3. 5 A) are observed between the WD40 domain and other regions of Cdc4 (Figure 2C). These contacts consist of 
hydrogen bonds between Asn684 and Arg700 in two loop regions of propeller blade 7 with Glu 323 in the a4-a5 
linker of the helical extension. Both hydrogen bonds are maintained in the two Cdc4 molecules of the crystal 

25 asymmetric unit but all three residues are poorly conserved amongst Cdc4 orthologues (Figure IB). The lack of 
additional stabilizing interactions suggests that the F-box to WD40 domain linker is not exceedingly rigid, and indeed, 
the WD40 domain in the two Cdc4 molecules of the crystal asymmetric unit differ relative to their F-box domains by 
a 5° rotation about the long axis of helix a6. In contrast, in hSkp2 the F-box domain helix oc4 terminates abruptly in 
an immediate transition to the LRR domain fold such that the adjoined domains form a rigid hydrophobic core 

30 (Schulman et al., 2000). Although the Skp2 and Cdc4 families of F-box proteins employ structurally divergent F-box 
interfaces, the general position of the WD40 and LLR domains are nonetheless similar (Figure 6A). 
Model of the SCF Cdc4 E2 complex. The structure of the Skpl-Cdc4-CPD complex sheds light on how substrates are 
presented by the F-box protein to the E2 for ubiquitin transfer. A complete model of the E2-SCF Cdc4 -substrate 
complex consisting of ubiquitin, hUbc7, hCull, hRbxl, ScCdc4, ScSkpl, and the CPD peptide is shown in Figure 6B. 
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This model is based on the reconstructed E2-SCF Skp2 complex derived by Pavletich and colleagues (Zheng et al., 
2002), in conjunction with an NMR-based ubiquitin-E2 thioester model (Hamilton et al., 2001). Two interesting 
features are apparent. First, the distance between the E2 active site cysteine and the phosphate group of the bound 
CPD peptide is approximately 59 A, which is similar to the spacing reported between the substrate interaction site and 
5 the E3 catalytic site in the hUbc7-Cbl structure (Zheng et al, 2000). Secondly, the WD40 domain presents the CPD 
peptide in a direct line-of-sight to the E2. Although the ligand-binding site on hSkp2 has not been determined, 
mutagenesis studies on the LRR-containing F-box protein Grrl in yeast suggest that substrates bind to the inner side 
of the curved repeat surface (Hsiung et al., 2001). If the position of this site is maintained in hSkp2, then the LRR 
domain of Skp2 is predicted to project substrates in an orthogonal direction to that of the Cdc4 WD40 domain (Figure 
10 6A). 

Phosphopeptide recognition. The CPD binding surface represents the most conserved part of the WD40 repeat 
domain structure (Figure 7A-D). The central CPD sequence Leu-pThr-Pro-Pro [SEQ ID NO. 41] was modeled 
unambiguously in unbiased experimental electron density maps in both Skpl-Cdc4-CPD complexes of the crystal 
asymmetric unit (Figure 3). Interpretable electron density is also apparent for the P-2 Leu, P+3 Gin, P+4 Ser, and P+5 

15 Gly positions, but only in one complex of the crystal asymmetric unit. The CPD peptide binds in an extended manner 
across (J-propeller blade 2 with the N-terminus oriented towards the central pore of the WD40 domain and the C- 
terminus oriented towards the outer rim. Identical substrate peptide orientations and contacts were observed for an 
independent Skpl-Cdc4-CPD structure with a phosphopeptide derived from the transcription factor Gcn4, which is a 
physiological substrate of Cdc4 in yeast (Meimoun et al., 2000; Chi et al, 2001). However, of the Gcn4 peptide 

20 sequence, Phe-Leu-Pro-pThr-Pro-Val-Leu-Glu-Asp [SEQ ID NO. 43], only the core residues Pro-pThr-Pro had 
discernable electron density. 

The CPD sequence requirements for the CPD-Cdc4 interaction are fully accounted for by structural elements 
in the WD40 domain. An absolute requirement for phosphorylation at Ser or Thr at the P-0 position of the CPD 
derives from a network of electrostatic interactions and hydrogen bonds that coordinate the P0 pThr phosphate group 

25 (Figure 7C, D). This interaction is mediated by residues that are conserved across all Cdc4 orthologs (Figure IB). The 
P0 phosphate group forms direct electrostatic interactions with the guanidinium groups of Arg485, Arg467, and 
Arg534 and a direct hydrogen bond with the side chain of Tyr548. The side chain of Tyr548 is coordinated by 
stacking interactions with the guanidinium group of Arg572, which in turn is coordinated by a hydrogen bond to the 
side chain of Tyr574. Although Cdc4 shows a 6-fold preference for pThr over pSer (Nash et al., 2001), the structural 

30 basis for this selectivity is not obvious since the Cy methyl group of Thr is directed towards solvent and does not make 
contact with the WD40 domain surface. 

A second absolute requirement for CPD-Cdc4 interaction rests on the P+l proline, the side chain of which 
projects into a three-sided pocket on the WD40 surface. One side of this pocket is formed by the side chain of Trp 
426, which packs in a coplanar manner with the P+l proline side chain. The opposite side of this binding pocket is 

35 formed by the side chain of Arg 485 via coordination of the proline side chain and backbone carbonyl group through 
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van der Waals and hydrogen bonding interactions, respectively. The side chains of Thr 441 and Thr 465 define the 
remaining side of the P+l proline binding pocket, with Cy side chain groups composing a hydrophobic surface. The 
hydroxyl groups of Thr 441 and 465 orient away from the P+l binding pocket, where they are well placed to influence 
binding specificity for CPD residues C-terminal to the P+l position. Unlike Trp 426 and Arg 485, which are invariant 
5 amongst the Cdc4 orthologs, Thr 441 and Thr 465 are both substituted with He in the 5. pombe Cdc4 ortholog Popl 
(Figure IB). This substitution might restrict CPD sequences able to bind Popl through steric or hydrophobic 
constraints on residues C-terminal to the P+l proline position. 

Cdc4 displays a strong preference for the hydrophobic residues Leu/Ile/Pro at the P-l and Leu/Ile at the P-2 
CPD positions. In the crystal structure, the P-l Leucine side-chain fits into a hydrophobic pocket composed of 

10 invariant residues Trp 426, Trp 717, and Thr 386, and the conserved hydrophobic residue Val 384. While less 
precisely modeled, the main chain position of Leu -2 lies in close proximity to a third hydrophobic pocket composed 
of the invariant residue Tyr574, and the conserved hydrophobic residues Met 590 and Leu634. The hydrophobic 
character of the P-l and P-2 pockets is manifest as selection against both charged and small polar residues at these 
positions in the CPD consensus (Nash et al., 2001). 

15 The WD40 phospho-recognition domain of Cdc4 is unusual in that it exhibits strong selectivity against either 

Arg or Lys residues in the P+2 to P+5 CPD positions, but otherwise shows no sequence preference at these positions 
(Nash et al., 2001). In the crystal structure, the side chain of P+2 Pro is directed towards solvent, while the main chain 
conformation of Pro+1 and Pro+2 causes the CPD to kink away from the peptide-binding surface from the Pro+2 
position onward. As a result, only one additional contact with Cdc4 is made by the CPD following the Pro +1 

20 position, namely a weak hydrogen bond with sub-optimal geometry between the P+4 Gin side chain and the side chain 
of Arg 485. Because the P+l Pro main chain is forced away from the WD40 domain surface, the selection against 
basic residues in the P+2, +3, +4 and + 5 positions in the CPD consensus is almost certainly due to through-space 
electrostatic repulsion. This effect arises from a dominant positive electrostatic potential generated by both the 
invariant triad of Arg residues that comprise the core pThr-Pro binding pocket, and by a radial extension of the surface 

25 due to Arg 572, Arg 443 and Lys 402, the former two of which are conserved amongst Cdc4 orthologs (Figure 7B). 

A number of natural mutations detected in metazoan orthologs of Cdc4 corroborate the structure-based 
analysis. Two ovarian cancer cell lines bear missense mutations at conserved Arg residues that correspond to Arg 467 
and Arg 534 in yeast Cdc4 (Moberg et al., 2001). In the crystal structure, these residues make direct contact with the 
P0 phosphate group and are essential for function (Figure 7 C, D). In a recent study of human primary endometrial 

30 tumors, mutations in phosphate-binding Arg residues equivalent to Arg 467 and Arg 485 were detected in 2 of 13 
tumor samples (Spruck et al., 2002). Other cancer-associated nonsense and frameshift mutations truncate hCdc4 
within the WD40 domain (Moberg et al, 2001; Strohmaier et al., 2001; Spruck et al., 2002). Similarly, all three 
characterized mutations in the Drosophila ago gene that lead to excess cell proliferation affect the WD40 domain 
(Moberg et al., 2001). One of these mutations, Alal 1 18Val, corresponding to position Ser532 in ScCdc4 substitutes a 

35 conserved small residue with a bulkier residue at the center of the critical Arg 434-Arg467-Arg534 triad (Figure 7C). 
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Mutational analysis of the F-box to WD40 domain linker. To probe the importance of orientation and rigidity in 
the F-box WD40 inter-domain linker, point mutations, insertions or deletions were introduced into the platform and 
stalk structure of Cdc4. None of these deletions affected the ability of the recombinant proteins to bind phospho-Sicl 
in vitro or protein abundance in vivo (Figure 8A and data not shown). Introduction of the helix destabilizing residues 
5 glycine and proline into helix a5 did not compromise Cdc4 function in vivo (Figure 8B), consistent with the poorly 
conserved nature of this region (Figure IB). However, two different deletions of helix a5 eliminated Cdc4 function in 
vivo, indicating that the F-box-WD40 domain interface is an essential structural component. Similarly, placement of 
helix destabilizing residues at the center of helix a6 or the lengthening of this helix by the insertion of one, two, three, 
four, 8 or 12 amino acid residues disrupted Cdc4 function in vivo. Helix ct6 is thus critical for productive orientation 
10 of the WD40 domain. 

Mutational analysis of the CPD binding surface. Previous mutational analysis based on sequence conservation in 
the Cdc4 family identified Arg467, Arg485 and Arg534 as essential for substrate binding and function in yeast (Nash 
et al., 2001). Two of the three corresponding residues in hCdc4, Arg 417 and Arg 457, are essential for the binding of 
phospho-cyclin E, while the third corresponding to Arg485 was not tested (Koepp et ah, 2001). To systematically 

15 probe the role of residues that form the highly conserved peptide binding surface, a panel of Cdc4 mutants was 
generated and each were tested for pSicl binding in vitro, complementation of a cdc4A strain and sensitivity to 
increased SIC! dosage. Four mutants, Arg467Ala, Arg485Ala, Arg534Ala, and Trp426Ala were unable to bind 
phospho-Sicl in vitro or complement a cdc4A strain, but were fully competent for Skpl binding (Figure 8A, B). The 
essential function of these residues is not confined to elimination of Sicl because none of the corresponding mutant 

20 alleles were able to rescue a cdc4A sicl A strain. These results reflect the critical structural role played by these 
residues in coordination of the P0 phosphate and the P+l proline of the CPD. Mutation of the remaining phosphate- 
coordinating residue, Tyr548, did not cause loss of viability but did result in dosage sensitivity to SIC Thr33Va \ which 
encodes a partially stabilized version of Sicl (Figure 8C). Mutation of Arg 572 had the same effect, as befits the 
observed stacking interaction between this residue and Tyr 548. Although both mutants were severely impaired for 

25 binding to phospho-Sicl in vitro, this effect may be exacerbated by the tendency of these recombinant proteins to 
aggregate. In summary, the six residues that directly or indirectly coordinate the primary pThr-Pro core motif are 
critical for CPD recognition in vitro and Cdc4 function in vivo. 

Disruption of residues that confer selection at the P-2, P-l and P+2 to P+5 positions had only modest effects 
on the ability of Cdc4 to target pSicl. A Trp717Asn mutation predicted to disrupt the P-l pocket conferred sensitivity 

30 to dosage of SICl ThrS3Va \ but did not overtly affect the pSicl-Cdc4 interaction in vitro. Individual mutations in all 
other residues that are well positioned to affect substrate selection, namely Arg443Ala, Arg443Asp, Lys402AIa, 
Tyr574Phe and Val384Asn were indistinguishable from wild type in each of the assays used. Substrate selection 
residues on the WD40 surface thus contribute only modestly if at all to the essential function of Cdc4. As described 
below, however, these residues play a subtle but critical role in setting the phosphorylation threshold for the CPD- 

35 Cdc4 interaction. 
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Modulation of CPD substrate selectivity. A critical feature of the Sicl-Cdc4 interaction is the requirement for 
phosphorylation of Sicl on multiple sites. To enforce this requirement, each of the phosphorylation sites in the native 
Sicl sequence are sub-optimal in one or more respects (Figure 9A). The Cdc4-CPD structure suggests that selectivity 
against basic residues may be due to electrostatic repulsion generated from the conserved patch of basic residues in 
5 and around the CPD binding pocket, while selection for hydrophobic residues arises from the P-l pocket that is 
composed in part by Val 384 and Trp717. To examine the basis for selection against sub-optimal CPD motifs, the 
effects of mutations in non-essential residues in these two regions on the multisite phosphorylation requirement for 
Sicl recognition were assessed. 

The ability of Cdc4 to capture various phosphoisoforms of wild type Sicl from a pool of recombinant Sicl 

10 that had been phosphorylated to various extents by Cln2-Cdc28 was monitored. As resolved by isoelectric focusing, 
this pool contained roughly equal amounts of Sicl phosphorylated on 1, 2, 3, 4, 5, 6, 7, 8 and 9 sites. Wild type Cdc4 
was only able to capture Sicl phosphorylated on six or more sites (Figure 9B). This result formally demonstrates the 
transition in binding affinity between 5 and 6 phosphorylation sites, as initially inferred from capture of a series of 
Sicl phosphorylation site mutants by Cdc4 (Nash et al., 2001). The role of positive electrostatic potential in selecting 

15 against sub-optimal CPD sequences with basic residues at C-terminal positions was tested with the Lys402Ala 
Arg443Asp double mutant. This mutant was able to select Sicl phosphoisoforms that contained as few as three 
phosphorylation sites (Figure 9B). The ability of the Lys402Ala Arg443Asp double mutant to capture lower 
phosphorylated forms of Sicl is also evident in one-dimensional SDS-PAGE (Figure 8A). Similarly, perturbation of 
the P-l hydrophobic pocket with a Val384Asn Trp717Asn double mutation allowed capture of Sicl phosphorylated 

20 on as few as four sites. These in vitro binding results were recapitulated in solution-based in vitro ubiquitination 
assays with wild type and mutant forms of Cdc4. Both double mutant forms of Cdc4 were able to convert Sicl 
phosphoryated on four or five sites to oligo-ubiquitinated species, whereas wild type Cdc4 was unable to do so (Figure 
9C). The double mutants were, however, less efficient than wild type at elaborating fully ubiquitinated species of 
phospho-Sicl, perhaps because of protein stability effects or interference with catalytic steps after substrate binding. 

25 This interpretation is consistent with the sensitivity of strains bearing the double mutant alleles to SICl Thr33Val dosage 
(Figure 8B). Overall, re-engineering of negative selection residues in the Cdc4 WD40 domain supports the notion that 
the series of sub-optimal CPD motifs in Sicl sets a high phosphorylation threshold for its recognition by Cdc4. 
Discussion 

The structure of the Skpl-Cdc4-CPD complex provides direct visualization of substrate orientation within an 
30 SCF complex. Insights gained from the structure include the unexpectedly frail interface between the F-box and the 
WD40 repeat domain, the basis for dedicated pThr-Pro dipeptide recognition by a novel eight-blade WD40 propeller, 
and a detailed understanding of the basis for selection against natural CPD sequences. The latter feature appears to be 
tailored to enforce multisite phosphorylation dependent degradation of Sicl, which in turn would help engender a 
highly cooperative onset of DNA replication (Nash et al., 2001). Similar principles may well operate for other Cdc4 
35 substrates, including cyclin E, Notch 10 and presenilin in mammalian cells (Strohmaier et al., 2001; Lai, 2002; Selkoe, 
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2001). Because yeast and human Cdc4 are structurally and functionally analogous (Nash et ah, 2001; Strohmaier et 
al., 2001; Koepp et at., 2001), the structure of yeast Cdc4 affords obvious insights for pharmacological modulation of 
hCdc4 function in these pathways. Interestingly, a significant proportion of characterized human and fly CDC4 
mutations alter residues in the CPD binding pocket. Given the probable requirement for homodimerization in active 
5 SCF complexes (Wolf et al., 1999), such mutations might act in a partial dominant negative manner to confer a 
growth advantage in the heterozygous state. 

Phospho-recognition by Cdc4. The specificity of phosphorylation-dependent recognition by the WD40 domain of 
Cdc4 is governed by three main determinants: (i) a dedicated pThr-Pro binding pocket; (ii) a deep hydrophobic pocket 
that selects hydrophobic residues N-terminal to the phosphorylation site, and (iii) a through space electrostatic 

10 selection against basic residues C-terminal to the phosphorylation site. As for all documented phospho-dependent 
lipid/protein recognition modules, the Cdc4 WD40 domain employs arginine residues to directly contact the 
phosphate group of the ligand. However, unlike most domains in which adjacent residues impose subtle effects on 
specificity (Yaffe and Elia, 2001), the P+l proline is an integral component of the core binding determinant (Nash et 
al., 2001). In the Cdc4-CPD co-crystal, ligand residues are locked in place by direct contact of the phosphate and 

15 proline carbonyl groups with three conserved and essential Arg residues, while the proline side chain inserts into a 
tight hydrophobic pocket formed by Trp426, Thr441, and Thr465. Because the phospho-binding pocket infrastructure 
has no obvious demarcation between the pThr and Pro binding sites, the Cdc4 WD40 domain is in effect a dedicated 
pThr-Pro binding module. 

Comparison to other peptide recognition modules. Interesting parallels can be drawn between the Cdc4 WD40 
20 domain, 14-3-3 domains and the class IV WW domains, which all have the ability to recognize phospho-Ser/Thr 
epitopes in the context of adjacent proline residues (Yaffe and Elia, 2001). The interaction of the Pinl class IV WW 
domain with a pSer-Pro peptide differs from Cdc4 in that it does not rely on an extensive network of Arg residues for 
phosphate coordination (Verdecia et al., 2000). However, a striking similarity between Pinl and Cdc4 lies in the P+l 
proline binding pocket, which in both cases depend on a highly conserved tryptophan side chain to engage the P+l 
25 proline pyrrolidine ring through a coplanar interaction. In contrast to Cdc4, Pinl actually displays a preference for Arg 
in the P+2 position, such that the binding specificity of the pSer-Pro recognition domain closely matches that of the 
targeting CDK enzymes. 

14-3-3 domains bind pSer epitopes with a preference, but not an absolute requirement, for proline residues at 
the P+2 position (Yaffe et al., 1997). This less stringent selection arises because the 14-3-3 proline binding pocket is 

30 able to accommodate other residues with propensity to form (3-turns. Interestingly, the proline preferences in both the 
14-3-3 and Cdc4 WD40 domains give rise to the same qualitative effect: in each case the prolines terminate direct 
interactions between the peptide and the ligand binding domain by orienting the peptide away from the domain 
surface. In the case of Cdc4, biologically significant electrostatic effects operate in spite of the loss of direct peptide 
contact. Physiologically relevant substrate anti-selection mediated by charge repulsion is unique amongst known 

35 protein interaction modules. 
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The structure of the Cdc4 WD40 domain provides direct evidence that WD40-type repeats can assemble into 
propellers with more than seven blades (Fulop and Jones, 1999). WD40 domains are known to interact with other 
proteins in at least two different modes, either across the front face of the propeller, as in the case of Cdc4, or on the 
outer edge of the propeller as in the case of clathrin (ter Haar et al., 2000). Modeling of the F-box protein P-TrCP, 
5 which binds the doubly phosphorylated consensus motif DpSGXXpS [SEQ ID NO. 42] in IkBcx, P-catenin, and Vpu 
(Yaffe and Elia, 2001), reveals an extensive conserved basic region on the front face of the propeller, which may 
engage substrate phosphoepitopes in an analogous manner to Cdc4. 

Spatial orientation of SCF substrates. A conserved feature between all E3 structures solved to date is the substantial 
distance between the substrate binding site and the catalytic site (Huang et al, 1999; Zheng et al., 2000; Zheng et al., 

10 2002). Superposition of the Skpl-Cdc4 complex onto a model of the Skpl-Cull-Rbxl-E2-ubiquitin complex suggests 
that the substrate is positioned for direct frontal attack by the E2 catalytic site, but that a gap of some 59A between the 
two sites must be bridged, presumably by the substrate polypeptide. The disordered structure of Sic 1 lends itself to 
this possibility (Nash et al., 2001). Intriguingly, overlay of the WD40 domain of Cdc4 with the LRR of Skp2 does not 
align the defined phosphopeptide binding pocket of Cdc4 with a potential phospho-recognition site on the concave 

15 face of the LRR repeats (Zheng et al., 2002), at least as defined by mutational analysis of the related F-box protein 
Grrl in yeast (Hsiung et al., 2001). If the relative position of substrates in the WD40 versus LRR class of F-box 
proteins do in fact differ, spatial leeway in substrate presentation must be possible. 

Based on the extensive Skpl-Skp2 interface, and on the inactivation of Cull by insertion of a flexible linker, 
it has been proposed that SCF complexes, and perhaps E3 enzymes in general, must present substrates to the catalytic 

20 site in a rigidly defined fashion (Zheng et al., 2002). Unexpectedly, the WD40 domain and the F-box of Cdc4 are 
linked only by a single oc-helical stalk, with very limited additional contacts. Despite the lack of sequence 
conservation in the a helix 6 structure that supports the WD40 domain, spatial constraints are nevertheless evident, as 
shown by the sensitivity of the structure to rotational and translational shifts caused by insertion of additional residues 
into the stalk. It is also possible that regions truncated from Cdc4 to enable crystallization may normally help stabilize 

25 the inter-domain interface. 

Cooperativity in substrate selection by Cdc4. The properties of the Cdc4 phosphopeptide binding module differ 
from those of other known modules in the important respect that the interaction with core recognition elements is 
partially offset by specific selection against basic residues in the substrate peptide. This feature establishes an intrinsic 
antagonism between the recognition mechanism and the targeting CDK kinases, which prefer Ser/Thr-Pro sites with 

30 C-terminal basic residues (Endicott et al., 1999). Significantly, all of the natural CPD motifs in Sicl contain one or 
more mismatches to the optimal CPD consensus. This system based on positive and negative ligand selection may not 
only set an elevated threshold for kinase activity, but may also allow the threshold to be precisely tuned for any given 
substrate by varying the number, spacing and properties of each site. Thus, Cdc4 is able to target numerous critical 
factors for phosphorylation-dependent degradation, including the Cdk inhibitor Sicl, the CDK inhibitor and 

35 polarization factor Farl, the replication initiator Cdc6 and the transcription factor Gcn4 5 all of which may be 
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controlled with different kinetics and different phosphorylation thresholds (Patton et al., 1998). In one extreme, 
typified by Gcn4 and cyclin E, the substrate may contain a high affinity site that is augmented by several minor low 
affinity sites (Meimoun et al., 2000; Chi et al. 5 2001; Strohmaier et al., 2001). In the other, more akin to Sicl, a large 
number of weak sites may cooperate to drive high affinity binding only when a phosphorylation threshold is reached. 
5 As shown here, mutation of either the distal basic selection region or the P-l pocket in Cdc4 shifts the binding 
equilibrium to lower phosphorylated forms of Sicl, which, in the absence of other structural effects that may 
compromise Cdc4, would be predicted to cause premature DNA replication and genome stability (Nash et al., 2001). 
These features distinguish Cdc4 from other known phospho-peptide binding modules characterized to date that 
typically interact with dedicated sites on their substrates through a single high affinity interaction. 

10 The mechanism that underlies the cooperative binding transition of the phospho-Sicl-Cdc4 interaction 

between five and six phosphorylation sites remains to be determined. In principle, multiple interactions sites might 
increase binding by engaging more than one binding site on Cdc4 (Figure 9D). This type of cooperative interaction is 
common in biological systems, as in the avidity of antibodies for polyvalent ligands and pathogen-host interactions 
(Mammen et al., 1998). Analogous cooperative binding interactions occur in signaling pathways. For instance, the 

15 dual SH2 domain phosphatase SH-PTP2 and the 14-3-3£ protein both engage two substrate binding sites on their 
respective ligands (Eck et al., 1996; Yaffe et al., 1997). However, inspection of the Cdc4 WD40 domain surface does 
not reveal any obvious ligand binding sites that might accommodate a second phosphorylated peptide motif, nor is 
there any biochemical evidence for secondary binding sites (Nash et al., 2001). In addition, the wide range of 
substrates and site spacing accommodated by Cdc4, including random concatamers of synthetic CPD sites (Nash et 

20 al., 2001), is a priori difficult to explain by two or more fixed binding sites on Cdc4. 

Instead, a model is favored that requires only a single phospho-dependent binding site on Cdc4 (Figure 9D). 
In this scheme, phosphorylation of multiple CPD sites on Sicl increases the local concentration of sites around Cdc4 
once the first CPD site is bound, to the point where diffusion limited escape from the receptor is overwhelmed by the 
probability of re-binding of any one CPD site. In a sense, Sicl becomes kinetically trapped in close proximity to 

25 Cdc4. Mathematical modeling of an idealized polyvalent ligand-monovalent receptor interaction indicates that the rate 
of ligand escape from the receptor exhibits a negative exponential dependence on the number of ligand sites. The term 
allovalent is proposed to describe the ability of multiple weak spatially separated ligand sites to cooperatively interact 
with a single receptor site. The prevalence of multisite phosphorylation (Cohen, 2000), and indeed of polyvalent 
ligands in general (Mammen et al., 1998), suggests that this type of behavior may underlie many biological processes. 

30 

The present invention is not to be limited in scope by the specific embodiments described herein, since such 
embodiments are intended as but single illustrations of one aspect of the invention and any functionally equivalent 
embodiments are within the scope of this invention. Indeed, various modifications of the invention in addition to those 
shown and described herein will become apparent to those skilled in the art from the foregoing description and 
35 accompanying drawings. Such modifications are intended to fall within the scope of the appended claims. In particular 
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it will be appreciated that the references to specific amino acid residues for particular a SCF complexes, and 
components thereof (e.g. F-box protein) illustrated in the Tables and Figures, in no way limits the scope of the 
invention and it will be appreciated that a person skilled in the art could determine the specific corresponding amino 
acid residues for other SCF complexes and components thereof. 

All publications, patents and patent applications referred to herein are incorporated by reference in their 
entirety to the same extent as if each individual publication, patent or patent application was specifically and 
individually indicated to be incorporated by reference in its entirety. All publications, patents and patent applications 
mentioned herein are incorporated herein by reference for the purpose of describing and disclosing the cell lines, 
vectors, methodologies etc. which are reported therein which might be used in connection with the invention. Nothing 
herein is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior 
invention. 

It must be noted that as used herein and in the appended claims, the singular forms "a", M an", and "the" 
include plural reference unless the context clearly dictates otherwise. Thus, for example, reference to "a host cell" 
includes a plurality of such host cells, reference to the "antibody" is a reference to one or more antibodies and 
equivalents thereof known to those skilled in the art, and so forth. 
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Table 1. 

Data Collection, Structure Determination, and Refinement Statistics 





Peak 


Inflection 


Remote 


Wavelength (A) 


0.9798 


0.9800 


0.9000 


Resolution (A) 


2.8 


2.9 


2.7 


R*m(%) 


5.9 (38.7) 


6.1 (36.1) 


5.0 (28.9) 


Total Reflections 


311509 


187010 


298371 


Unique Reflections 


107167 


96027 


116218 


Completeness (%) 


99.8 (99.1) 


99.3 (98.3) 


97.7 (93.6) 


I/a 


9.9 (2.7) 


7.4(2.1) 


10.1 (2.9) 



Phasing Power 

Refinement statistics 

Resolution range (A) 20-2.8 
Reflections 103863 

Rfacto/Rfree(%) 24.09/28.71 

Rms deviations 

Bonds (A) 0.0091 
Angles O 1.3453 

Space group P3 2 : a = b = 107.7 A, c = 168.3 A; a = b = 90° , c = 120° ; 
Two molecules per asymmetric unit. 



*Data for the highest resolution shell (2.90-2.80 A) 

2 Rsym = 100 x Z|I - <I>|/E<I>, where I is the observed intensity and <I> is the average intensity from multiple 
observations of symmetry-related reflections. 

3 Phasing power for isomorphous and anomalous acentric reflections, where phasing power = <(|F h>c | / phase-integrated 
lack of closure)>. 

4 Rfrec was calculated with 8.8% of the data. 
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Table 2. 

Data Collection, Structor* Determination, and Refinement Statistics 



10 



15 



20 



25 



Phasing Statistics 



Peak 



Wavelength (A) 
Resolution (A) 



Total Reflections 
Unique Reflections 
Completeness (%) 
Va 

Phasing Power (ISO/ANO) 



0.9798 

2.8 (25-2.8) 

5.9 (37.2) 
311509 
107167 
99.8 (99.1) 
9.9 (2.7) 
5.2/1.3 



Inflection 

0.9800 

2.9 (3.0-2.9) 

6.1 (36.1) 

187010 

96027 

99.3 (98.3) 

7.4(2.1) 

4.0/0.94 



Remote 



0.9000 
2.7 (2.8-2.7) 
5.0(28.9) 
298371 
116218 
97.7 (936) 
10.1 (2.9) 
0/0.91 



Refinement statistics (remote wavelength) 



Resolution range (A) 
Reflections 

Rms deviations 
Bonds (A) 
Angles C) 



20-2.7 

113960 

23.8/27.3 

0.0089 
1.42 



# protein atoms 

# water molecules 



9364 
72 



[= b = 107.7 A, c = 168.3 A; a =b- 90* , c = 120* 



Space group P3 2 : a = 
Two Slcpl-Cdc4-CPD complexes per asymmetric unit 



30 

%m was calculated with 8.8% of the data. 
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Table 3. 

Atomic Contacts of a Substrate Binding Pocket 



No. of 
Atomic 
Interaction 


CDC4 WD40 Motif 
Atomic Contact 


CDC4 atomic 
Contact 


CPD Motif 
Atomic Contact 


1 


He 295 
Ile296 
Leu 315 

Trp 316 Leu 319 


Phe 255 
Leu356 




2 


Val 687 
He 696 
Leu 726 
Phe 743 


Trp 365 
lie 364 






Phe 743 


Asn 364 




4 


Asn 684 
Arg 700 


Glu 323 




5 


Arg 485 

Arg 534 
Tyr 548 




PO pTyr 
Phosnhate 


6 


Trp 426 
Arg 485 
Thr 386 
Thr 441 
Thr 465 




P+l Proline side 
chain 


7 


Trp 426 
Trp 717 
Thr 386 
Val 384 




P+l Leucine side 
chain 


8 


Tyr 574 
Thr 386 
Val 384 




Leucine +2 



78 



Table 4. 

Atomic Contacts of a Substrate Binding Pocket 



No. of 
Atomic 
Interaction 


CDC4 WD40 
Motif/F-box Domain 
Atomic Contact 


CDC4 atomic 
Contact 


CPD Motif 
Atomic Contact 


Atomic 
Interaction 
Property 


1 


He 295 
Ile296 
Leu 315 
Tip 316 
Leu 319 


Phe 355 
Leu356 




hydrophobic 
interactions and 
van der Wals 
interactions 


2 


Val 687 
He 696 
Leu 726 
Phe 743 


Tip 365 
He 361 




van der Wals and 

hydrophobic 

interactions 


3 


Phe 743 


Asn 364 




hydrogen bond 


4 


Asn 684 
Arg 700 


Glu 323 




hydrogen bonds 


5 


Arg 485 
Arg 467 
Arg 534 
Tyr548 




PO nTvr or nSer 
Phosphate at P-0 
position of CPD 


p 1 pr trr\ q t a ti p 

interactions 
hydrogen bond 


6 


Tip 426 
Arg 485 
Thr441 
Thr465 




P+l Proline side 
chain and 
backbone carbonyl 
of CPD 


hydrogen and van 
der Wals 
hydrophobic 
^iterations 


7 


Tip 426 
Tip 717 
Thr386 
Val 384 




P-l Leucine (or 
He/Pro) side chain 


hydrophobic 
interactions 


8 


Tyr574 
Met 590 
Leu 634 




Leucine -2 
Leu/Ile at P-2 
position 


hydrophobic 
interactions 


9. 


Tyr342 
Leu 338 
Leu 334 






hydrophobic 
interactions 
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il 



a 



13 



c/3 



WD 



I 

a 



I 



11 



! 
il 
1 



I 



1 



1 



1 



1 



1 



mini 



ii 



i 



i 



mi 



i 



i! 



Hi 



ii 



nu 



1! 



5 



i 
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Table 6 



REMARK 


peptide 


link 


removed 


(applied 


DPEP) : 


from 


A 


31 


to 


A 


45 


REMARK 


peptide 


link 


removed 


(applied 


DPEP) : 


from 


A 


73 


to 


A 


86 


REMARK 


peptide 


link 


removed 


(applied 


DPEP) : 


from 


B 


496 


to 


B 


508 


REMARK 


peptide 


link 


removed 


(applied DPEP) : 


from 


C 


31 


to 


C 


45 


REMARK 


peptide 


link 


removed 


(applied 


DPEP) : 


from 


c 


73 


to 


c 


86 


REMARK 


peptide 


link 


removed 


(applied 


DPEP) : 


from 


D 


496 


to 


D 


508 


REMARK 


peptide 
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REMARK coordinates from minimization and B-factor refinement 
10 REMARK refinement resolution: 20 - 2.8 A 

REMARK starting r= 0.2415 free_r- 0.284 6 

REMARK final r= 0.2409 free_r= 0.2871 

REMARK rmsd bonds- 0.009114 rmsd angles- 1.34531 

REMARK B rmsd for bonded mainchain atoms- 1.230 target- 1.5 
15 REMARK B rmsd for bonded sidechain atoms- 1.778 target- 2.0 

REMARK B rmsd for angle mainchain atoms- 2.103 target- 2.0 

REMARK B rmsd for angle sidechain atoms- 2.675 target- 2.5 

REMARK target- mlf final wa= 2.77695 

REMARK final rweight- 0.1078 (with wa= 2.77695) 
20 REMARK md-method= torsion annealing schedule- constant 

REMARK starting temperature- 2000 total md steps- 1 * 100 

REMARK cycles- 2 coordinate steps- 20 B-factor steps- 10 

REMARK sg= P3(2) a- 107.669 b= 107.669 c= 168.3 alpha- 90 beta- 90 gamma- 120 

REMARK topology file 1 : CNSJTOPPAR: protein . top 
25 REMARK topology file 2 : CNSJTOPPAR: dna-rna . top 

REMARK topology file 3 : CNSJTOPPAR: water . top 

REMARK topology file 4 : CNS_TOPPAR: ion . top 

REMARK topology file 5 : CNSJTOPPAR: tpo . top 

REMARK parameter file 1 : CNSJTOPPAR : protein_rep . param 
30 REMARK parameter file 2 : CNSJTOPPAR : dna-rna_rep . param 

REMARK parameter file 3 : CNSJTOPPAR: water_rep .param 

REMARK parameter file 4 : CNSJTOPPAR : ion . param 

REMARK parameter file 5 : CNSJTOPPAR : tpo . param 

REMARK molecular structure file: automatic 
35 REMARK input coordinates: 36modl.pdb 

REMARK reflection file- remote. cv 

REMARK ncs= none 

REMARK B-correction resolution: 6.0 - 2.8 
REMARK initial B-factor correction applied to fobs : 
40 REMARK Bll- 1.580 B22- 1.580 B33- -3.160 
REMARK B12- -3.767 B13- 0.000 B23- 0.000 

REMARK B-factor correction applied to coordinate array B: 0.915 
REMARK bulk solvent: density level- 0.324998 e/A A 3, B-factor- 34.4718 A A 2 
REMARK reflections with | Fobs | /sigma_F < 0.0 rejected 
45 REMARK reflections with | Fobs | > 10000 * rms(Fobs) rejected 
REMARK anomalous diffraction data was input 

REMARK theoretical total number of refl. in resol. range: 107240 ( 100.0 % ) 
REMARK number of unobserved reflections (no entry or |F|-0): 3377 ( 3.1 % ) 
REMARK number of reflections rejected: 0 ( 0.0 % ) 

50 REMARK total number of reflections used: 103863 ( 96.9 % ) 

REMARK number of reflections in working set: 93784 ( 87.5 % ) 

REMARK number of reflections in test set: 10079 ( 9.4 % ) 

CRYST1 107.669 107.669 168.300 90.00 90.00 120.00 P 32 



REMARK FILENAME="ref 37 . pdb" 

REMARK DATE:28-Jun-2002 13:23:24 created by user: orlicky 

REMARK VERSION: 1.1 
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83 

ATOM 106 CA VAL A 16 64.954 74.330 68.592 1.00 36.47 A 

ATOM 107 CB VAL A 16 64.597 73.166 67.628 1.00 38.19 A 

ATOM 108 CGI VAL A 16 65.697 72.981 66.579 1.00 37.94 A 

ATOM 109 CG2 VAL A 16 64.403 71.884 68.421 1.00 40.71 A 

5 ATOM 110 C VAL A 16 65.992 73.862 69.601 1.00 36.55 A 

ATOM 111 O VAL A 16 67.199 74.020 69.398 1.00 36.18 A 

ATOM 112 N ASP A 17 65.511 73.294 70.699 1.00 36.09 A 

ATOM 113 CA ASP A 17 66.398 72.817 71.750 1.00 36.61 A 

ATOM 114 CB ASP A 17 65.586 72.082 72.812 1.00 38.44 A 

10 ATOM 115 CG ASP A 17 66.458 71.416 73.840 1.00 40.95 A 

ATOM 116 OD1 ASP A 17 66.418 70.164 73.924 1.00 43.10 A 

ATOM 117 OD2 ASP A 17 67.184 72.137 74.556 1.00 41.69 A 

ATOM 118 C ASP A 17 67.499 71.902 71.196 1.00 35.77 A 

ATOM 119 O ASP A 17 67.218 70.903 70.543 1.00 33.05 A 

15 ATOM 120 N LYS A 18 68.757 72.245 71.471 1.00 36.17 A 

ATOM 121 CA LYS A 18 69.897 71.473 70.972 1.00 35.91 A 

ATOM 122 CB LYS A 18 71.208 72.003 71.541 1.00 35.90 A 

ATOM 123 CG LYS A 18 72.397 71.086 71.239 1.00 36.70 A 

ATOM 124 CD LYS A 18 73.679 71.525 71.964 1.00 37.63 A 

20 ATOM 125 CE LYS A 18 74.131 72.918 71.523 1.00 38.26 A 

ATOM 126 NZ LYS A 18 75.528 73.223 71.956 1.00 38.28 A 

ATOM 127 C LYS A 18 69.776 70.010 71.319 1.00 36.47 A 

ATOM 128 O LYS A 18 70.048 69.129 70.497 1.00 35.04 A 

ATOM 129 N LYS A 19 69.388 69.756 72.559 1.00 38.15 A 

25 ATOM 130 CA LYS A 19 69.220 68.393 73.011 1.00 38.79 A 

ATOM 131 CB LYS A 19 68.733 68.374 74.456 1.00 39.52 A 
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/ 


67 . 


r c c 

bb6 


1 0 
-13 . 


^ 0 1 
631 


0 0 
/2 . 


/IOC 

4 8b 


1 


ft ft 

. 00 


ft 1 
91 


ft ft 
. 90 


TV rn r\\A 

ATOM 


y ozu 


0 


/—> T XT, 

GLN 


77 


/ 


6 / . 


A O ft 

4 2 9 


1 A 

-14 . 


000 
383 


o 0 

/ 3 . 


A O 1 

431 


1 


ft ft 

. 00 


ft 0 
92 


. 11 


ATOM 


9321 


N 


SER 


E 


0 
O 


68 . 


ft C 1 

Ool 


1 A 

-14 . 


ft c c 

055 


71 . 


r> 0 0 
282 


1 


ft ft 

. 00 


ri ft 
92 


ft c 

. 25 


Al OM 


ft O O O 

y 322 


CA 


C" T7 ft 

SER 


T7 

b 


0 
O 


c 0 
b8 . 


00c 
2 3o 


-lo . 


A C~l 

4 6/ 


O ft 

/0 . 


0 0 0 
933 


1 


ft ft 

. 00 


0 0 
92 


1 A 

. 14 


7\ rp /"NX * 

ATOM 


9323 


CB 


SER 


E 


0 
8 


67 . 


113 


-15 . 


Oft/" 

8 96 


ft 
69 . 


ft 0 1 
981 


1 


ft ft 

. 00 


ft ft 
92 


0 c 

. 25 


Al OM 


ft O O >l 

y 32 4 


OG 


SER 


E 


0 
8 


b b . 


ft o 0 

9/8 


-17. 


O ft c 

30b 


C ft 

69 . 


ft 0 0 
938 


1 


ft ft 

. 00 


0 0 
92 


ti 0 
. 67 


ATOM 


n n n r 

932b 


C 


SER 


E 


8 


C ft 

69 . 


603 


-15 . 


64 0 


70 . 
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1 


ft ft 

. 00 


92 


. 02 


Ti rp /^\T\ * 

ATOM 


932 6 


0 


SER 


E 


8 


i ft 
70 . 


638 


-15 . 


398 


"7 ft 

70 . 


87 9 


1 


ft ft 

. 00 


ft i 
91 


. 94 


7\ rp ^Muf 
Al OM 


932 / 


N 


GLY 


E 


9 


r ft 

69 . 


/" 0 0 
602 


-16 . 


ft c 1 

057 


c 0 

68 . 


ft ft A 

994 


1 


ft ft 

. 00 


ft 1 
91 


n 0 

. 78 


ATOM 


ft 0 0 0 
9328 


CA 


GLY 


E 


9 


1 ft 
70 . 


854 


-16 . 


235 


68 . 


277 


1 


ft ft 

. 00 


ft 1 
91 


. 39 


ATOM 


noon 
932 9 


C 


GLY 


E 


9 


H ft 

7 0 . 


O A O 

8 4 8 


-15 . 


COT 

537 


66 . 


ft 0 0 
928 


1 


ft ft 

. 00 


ft 1 
91 


0 0 
. 23 


ATOM 


ft *^ ft 

9330 


0 


GLY 


E 


9 


69 . 


743 


-15 . 


212 


66 . 


445 


1 


. 00 


90 


. 99 


ATOM 


9331 


OXT 


GLY 


E 


9 


71. 


939 


-15. 


322 


66. 


346 


1 


.00 


90 


.21 


ATOM 


9332 


CB 


LEU 


F 


2 


137. 


489 


-5. 


702 


48. 


818 


1 


.00 


79 


.71 


ATOM 


9333 


CG 


LEU 


F 


2 


138. 


842 


-5. 


743 


49. 


546 


1 


.00 


80 


.18 


ATOM 


9334 


CD1 


LEU 


F 


2 


139. 


667 


-4 . 


549 


49. 


099 


1 


.00 


79 


.66 


ATOM 


9335 


CD2 


LEU 


F 


2 


138. 


661 


-5. 


730 


51. 


075 


1 


.00 


78 


.72 


ATOM 


9336 


C 


LEU 


F 


2 


138. 


487 


-6. 


493 


46. 


630 


1 


.00 


78 


.95 


ATOM 


9337 


0 


LEU 


F 


2 


138. 


336 


-7 . 


720 


46. 


632 


1 


.00 


77 


. 64 


ATOM 


9338 


N 


LEU 


F 


2 


136. 


056 


-5. 


715 


46. 


790 


1 


.00 


79 


.28 


ATOM 


9339 


CA 


LEU 


F 


2 


137. 


454 


-5. 


545 


47. 


286 


1 


.00 


79 


.36 
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ATOM 


934 0 


N 


LEU 


F 


3 


ion 
13 9. 


C A A 

bZ\J 


c 

-o . 


o / 1 


4 b . 


A C A 

U oU 


1 . 


A A 
UU 


n q 


Zb 


F 




ATOM 


a a a i 

9341 


CA 


LEU 


F 


3 


t a a 
14 0. 


cn a 

. 67 U 


c 
-o . 


C A A 

oU2 


A C 

4 D . 


A A A 

403 


1 . 


A A 

00 


"7 C 


C A 

54 


F 




ATOM 


9342 


CB 


LEU 


F 


3 


7 A A 

14 0. 


A A A 

,4 33 


c 
- D , 


1 A A 
/ 4 U 


A A. 

4 3 . 


Aid 

915 


7 

1 . 


A A 
UU 


1 1 . 


O A 

32 


F 




ATOM 


9343 


CG 


LEU 


F 


■a 
3 


139 . 


A C Q 


- / . 


Q A A 

8U2 


/ A. 

4 3 . 


/I A Q 

42 9 


1 . 


A A 
UU 


H Q 


Q O 
0 O 


F 


5 


ATOM 


9344 


CD1 


LEU 


F 


3 


i a o 

138 . 


1 A A 

, 100 


*7 


7 A A 
190 


A A 

4 3 . 


A O 1 

081 


7 

1 . 


A A 

00 


O A 


1 A 

14 


F 




ATOM 


9345 


CD2 


LEU 


F 


3 


"I A A 

140 , 


A C C 


o 
-o . 


A A A 

,449 


4 2 . 


OA/ 

204 


7 

1 . 


A A 

00 


1 o . 


/ 1 


F 




ATOM 


9346 


C 


LEU 


F 


3 


7 A 7 

141. 


n a a 

.723 


-5 . 


A 7 C 

,415 


A C 

45 . 


C A A 

530 


1 . 


A A 

00 


lb . 


A A 

20 


F 




ATOM 


9347 


0 


LEU 


F 


3 


141 . 


.444 


-4 . 


, 264 


45 . 


195 


1 . 


00 


75 . 


82 


F 




ATOM 


9348 


N 


TPO 


F 


4 


142 . 


. 936 


-5 . 


"7 A /? 

,74 6 


4 6 . 


, 012 


1 . 


a a 

00 


72 . 


87 


F 


10 


ATOM 


9349 


CA 


TPO 


F 


4 


14 4, 


. 008 


-4 , 


, 734 


4 6 . 


, 163 


1 . 


00 


"7 A 

70 . 


O A 

82 


F 




ATOM 


9350 


CB 


TPO 


F 


4 


14 4, 


. 161 


-4 . 


C A A 

. 534 


4 7 . 


,709 


1 . 


00 


68 . 


C A 

53 


F 




ATOM 


9351 


CG2 


TPO 


F 


4 


14 5, 


. 393 


-3 . 


.74 8 


vl A 

4 8 , 


7 a r 

,14 6 


1 . 


00 


"7 A 

70 . 


4 6 


F 




ATOM 


9352 


OGl 


TPO 


F 


4 


142 , 


, 965 


-3 . 


. 903 


a a 

4 8 , 


A A "3 

. 203 


1 . 


A A 

00 


65 . 


51 


F 




ATOM 


9353 


P 


TPO 


F 


4 


142 , 


. 778 


-2 . 


. 260 


4 8 . 


. 081 


1 . 


00 


60 . 


1 A 

1 9 


F 


15 


ATOM 


9354 


OIP 


TPO 


F 


4 


142 , 


. 871 


-1 , 


. 889 


4 6 , 


. 565 


1 . 


00 


64 . 


16 


F 




ATOM 


9355 


02P 


TPO 


F 


4 


7 A "3 

14 3, 


. 905 


-1 . 


C A 7 

, 501 


4 o . 


O O A 

.88 9 


1 . 


A A 

00 


C A 

bZ . 


63 


F 




ATOM 


9356 


03P 


TPO 


F 


4 


141, 


. 319 


1 

-1 , 


n *7 c 

, 975 


4 8 . 


C A O 

. 672 


1 . 


A A 

00 


(Z A 

64 . 


A A 

4 9 


F 




ATOM 


9357 


C 


TPO 


F 


4 


145 , 


. 311 


-5 . 


.24 4 


4 5 , 


. 57 9 


1 . 


00 


H A 

7 0 . 


A A 

94 


F 




ATOM 


9358 


O 


TPO 


F 


4 


14 5 


. 629 


- 6 


. 541 


4 5 , 


C f~ A 

. 562 


1 . 


A A 

00 


"7 1 
/ 1 . 


31 


F 


20 


ATOM 


9359 


N 


PRO 


F 


5 


146 


. 116 


-4 , 


, 278 


45 . 


.081 


1 . 


00 


71 . 


04 


F 




ATOM 


9360 


CD 


PRO 


F 


5 


145 


. 651 


-2 


. 886 


4 4 , 


one 

. 8 95 


1 . 


A A 

00 


r a 

69 . 


A A 

22 


F 




ATOM 


9361 


CA 


PRO 


F 


5 


147 


.434 


-4 


. 439 


4 4 


ACQ 

.4 68 


1 . 


A A 

00 


7 1 . 


A "7 

37 


F 




ATOM 


9362 


CB 


PRO 


F 


5 


147 


. 722 


-3 


. 051 


43 


. 898 


1 . 


00 


70 . 


57 


F 




ATOM 


9363 


CG 


PRO 


F 


5 


146 


. 37 9 


-2 


.473 


A 1 

4 3 


/ jo 

. 673 


1 . 


A A 

00 


c o 
DO . 


5b 


F 


25 


ATOM 


9364 


C 


PRO 


F 


5 


148 


.447 


-4 


. 823 


4 5 


C A A 

. 533 


1 . 


A A 

00 


1 A 

72 . 


r a 

63 


F 




ATOM 


9365 


O 


PRO 


F 


5 


14 8 


. 273 


-4 


. 524 


4 6 


T A A 

.709 


1 , 


A A 

00 


1 A 

72 . 


A tl 

2 D 


F 




ATOM 


9366 


N 


PRO 


F 


6 


14 9 


. 531 


-5 


.484 


4 o 


7 A A 

. 133 


1 . 


A A 

UU 




ti Q 

oo 


TP 

r 




ATOM 


9367 


CD 


PRO 


F 


6 


14 9 


.883 


-5 


.888 


» a 

4 3 


"1 f A 

. 7 60 


1 . 


A A 

00 


1 A 

7 4 . 


bb 


F 




ATOM 


9368 


CA 


PRO 


F 


6 


150 


tz r n 

. 563 


r 

-5 


o o o 

. 898 


4 6 


A O A 

.08/ 


1 . 


A A 

, UU 


1 b . 


A A 

. 4 4 


TP 
£ 


30 


ATOM 


9369 


CB 


PRO 


F 


6 


151 


. 636 


-6 


.4 99 


4 5 


7 A A 

. 180 


1 . 


A A 

, 00 


1 6 . 


7 *7 

17 


F 




ATOM 


9370 


CG 


PRO 


F 


6 


150 


. 843 


-7 


. 015 


4 4 


. 005 


1 . 


A A 

, 00 


lb . 


3 6 


F 




ATOM 


9371 


C 


PRO 


F 


6 


151 


. 108 


-4 


.721 


4 6 


. 907 


1 . 


A A 

, 00 


78 , 


7 A 

, 12 


F 




ATOM 


9372 


O 


PRO 


F 


6 


151 


.745 


-3 


. 827 


4 6 


. 351 


1 . 


, 00 


78 . 


87 


F 




ATOM 


9373 


N 


GLN 


F 


7 


150 


. 858 


-4 


. 699 


4 8 


.213 


1 , 


A A 

, 00 


1 9 . 


r a 

62 


F 


35 


ATOM 


9374 


CA 


GLN 


F 


7 


151 


. 397 


-3 


. 605 


4 9 


. 020 


1 . 


A A 

, 00 


81 . 


O "7 

37 


F 




ATOM 


9375 


CB 


GLN 


F 


7 


150 


. 607 


-3 


. 435 


50 


. 333 


1 , 


, 00 


81 . 


68 


F 




ATOM 


9376 


CG 


GLN 


F 


7 


151 


. 014 


-2 


. 213 


51 


.171 


1 . 


. 00 


82 , 


. 68 


F 




ATOM 


9377 


CD 


GLN 


F 


7 


150 


. 936 


-0 


.896 


50 


. 397 


1 , 


, 00 


83 . 


. 36 


F 




ATOM 


9378 


OE1 


GLN 


F 


7 


151 


. 562 


-0 


.740 


4 9 


.343 


1 . 


, 00 


84 , 


. 22 


F 


40 


ATOM 


9379 


NE2 


GLN 


F 


7 


150 


. 175 


0 


. 061 


50 


. 927 


1 . 


. 00 


82 . 


. 82 


F 




ATOM 


9380 


C 


GLN 


F 


7 


152 


. 866 


-3 


. 939 


49 


. 299 


1 , 


, 00 


82 , 


. 45 


F 




ATOM 


9381 


O 


GLN 


F 


7 


153 


. 186 


-4 


. 607 


50 


. 282 


1 , 


. 00 


A A 

82 . 


. 7 8 


F 




ATOM 


9382 


N 


SER 


F 


8 


153 


. 745 


-3 


. 472 


48 


.411 


1 . 


. 00 


83 . 


, 82 


F 




ATOM 


9383 


CA 


SER 


F 


8 


155 


. 196 


-3 


.704 


48 


.479 


1 , 


. 00 


85 . 


. 47 


F 


45 


ATOM 


9384 


CB 


SER 


F 


8 


155 


.937 


-2 


.583 


47 


.732 


1, 


.00 


84 . 


.24 


F 




ATOM 


9385 


OG 


SER 


F 


8 


155 


.277 


-2 


.236 


46 


.527 


1, 


.00 


83, 


.28 


F 




ATOM 


9386 


C 


SER 


F 


8 


155 


.813 


-3 


.857 


49 


.887 


1 , 


.00 


86, 


.86 


F 




ATOM 


9387 


O 


SER 


F 


8 


156 


.285 


-2 


.881 


50 


.482 


1, 


.00 


87 , 


.36 


F 




ATOM 


9388 


N 


GLY 


F 


9 


155 


.830 


-5 


.093 


50 


.390 


1, 


.00 


87. 


.52 


F 


50 


ATOM 


9389 


CA 


GLY 


F 


9 


156 


.387 


-5 


.374 


51 


.708 


1, 


.00 


87. 


.82 


F 




ATOM 


9390 


C 


GLY 


F 


9 


155 


.445 


-6 


.117 


52 


.646 


1 , 


.00 


87, 


.94 


F 




ATOM 


9391 


O 


GLY 


F 


9 


154 


.243 


-6 


.240 


52 


.319 


1 , 


.00 


88 , 


.07 


F 




ATOM 


9392 


OXT 


GLY 


F 


9 


155 


.901 


-6 


.573 


53 


.717 


1. 


.00 


87, 


.76 


F 



END 
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Table 7. 

Oligonucleotides used in Example 2 



Mima 


C^i Man nomp 
VSllgU IlalllC 


OvUlivlltC 


MTO 1146 


LP6DelA-FWD 


gccttcgcgaactgatgtacactttgcagggtcatacagc 


MTO 1147 


LP6DelA-REV 


gccttggccattttctagatcccaaattctaatagtggtatccatac 


MTO 1254 


CDC4-BNSH-F 


gtaggatccatatgggcgccgcaagctttcccttagctgagtttcc 


MTO 1367 


V384N-C 


catatgacgagtaatattacgtgcttgc 


MTO 1368 


V384N3 


caagcacgtaatattactcgtcatatg 


MTO 1369 


K402A-C 


ggggctgatgacgcaatgatcagag 


MTO 1370 


K402A-N 


ctctgatcattgcgtcatcagcccc 


MTO 1371 


W426A5' 


gatggtggggttgcggcgctgaagtatgcccatg 


MTO 1372 


DN426A 


catgggcatacttcagcgcc^caaccccaccatc 


MTO 1373 


R443D-C 


ggttctacagacgacacggtccgagtttggg 


MTO 1374 


R443D 


tatcccaaactcggaccgtgtcgtcgtctgtagaaccg 


MTO 1375 


Y548F 


cagtgtattatcaaagcttccactaacgac 


MTO 1376 


Y548F-C 


gtcgttagtggaagctttgataatacactg 


MTO 1377 


Y574F-N 


gtagattgtcgaaaatattcgatccgtatg 


MTO 1378 


Y574F-C 


catacggatcgaatattttcgacaatctac 


MTO 1379 


W717N Hpa 


ttgcccttaaagttaaccgagttaatctgatcag 


MTO 1380 


W717N-C 


ctgatcagattaactcggttaactttaagggcaa 


MTO 1381 


2Flex359 5' 


tcttttctggagcccgggccaattttaaaaaattggtac 


MTO 1382 


2Flex359 3' 


gtaccaattttttaaaattggcccgggctccagaaaaga 


MTO 1383 


H5 Dest5' 


ggttttaattctctcggcccgggaccctcccaaaaatacccaaaactc 


MTO 1384 


H5 Dest3' 


gagttttgggtatttttgggagggtcccgggccgagagaattaaaacc 


MTO 1385 


H5 del 5' 


gtgagcccaaagggtccaaagctttcacaacaagatcgc 


MTO 1386 


H5 del 3' 


gcgatcttgttgtgaaagctttggaccctttgggctcac 


MTO 1387 


H5+Dest5' 


gtggaaaaaacttctgatattcattttaaaaaattggtacaatcc 


MTO 1388 


H5+Dest3' 


ggattgtaccaattttttaaaatgaatatcagaagttttttccac 


MTO 1389 


Alal-3 


ccaattttttaaaatgaatgcaatattctccagaaaag 


MTO 1390 


Ala2-3 


ccaattttttaaaatgaatgctgcaatattctccagaaaag 


MTO 1391 


Ala3-3 


ccaatttttttaaaatgaaagcggctgcaatattctccagaaaag 


MTO 1392 


Ala4-3 


ccaattttttaaaatgaaggctgcagctgcaatattctccagaaaag 


MTO 1393 


Ala8 5' 


gccgcgcgcgctaaagctgcagcgttcattttaaaaaattggtacaatcc 


MTO 1394 


Ala8 3' 


cgctgcagctttagcgcgcgcggctatattctccagaaaagataatc 


MTO 1395 


Alal2 5' 


gccgcgcgcgctaaagctgcagcgcgcgcgaaagccttcattttaaaaaattggtacaatcc 


MTO 1396 


A112 3' 


ggctttcgcgcgcgctgcagctttagcgcgcgcggctatattctccagaaaagataatc 
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Table 8. 
Plasmids used in Example 2 



PlasmiH 


Relevant Characteristic 


Snn rre 


PMT3169 


pProEx Hta-CDC4 LP6DelA GST-Skpl 


This study 


pMT 3055 


pProEx Hta-CDC4 263-744 GST-Skpl 


Nash et. al. 


pMT 3000 


pProEx Hta-CDC4 1-744 GST-Skpl 


This study 


pMT3217 


pRS314CDC4-BNSH/Eco 


This study 


PMT3001 


PMT3055 V384N 


This study 


pMT 3002 


pMT3055 K402A 


This study 


pMT3385 


pMT3055 W426A 


This study 


pMT3058 


pMT3055 R443A 


Nash et. al 


pMT 3003 


pMT 3055 R443D 


This study 


pMT 3059 


pMT 3055 R467A 


Nash et. al 


pMT 3060 


pMT 3055 R485A 


Nash et. al 


pMT3061 


pMT3055 R534A 


Nash et. al 


pMT 3004 


pMT3055 Y548F 


This study 


pMT 3005 


pMT3055 Y574F 


This study 


pMT 3006 


PMT3055 W717N 


This study 


pMT 3007 


pMT 3055 V384N+W717N 


This study 


pMT3008 


pMT3055 K402+R443D 


This study 


pMT3010 


pMT 3000 R443D 


This study 


pMT3011 


pMT 3000 R443A 


This study 


pMT3012 


pMT3000 W717N 


This study 


pMT3013 


pMT 3000 K402A+R443D 


This study 


pMT3014 


pMT3000 V384N+W717N 


This study 


pMT3015 


pMT3217 V384N 


This study 


pMT3016 


pMT3217K402A 


This study 


pMT3386 


pMT3217 W426A 


This study 


pMT3017 


pMT3217R443D 


This study 


pMT 3058 


PRS314-CDC4R443A 


Nash et. al. 


PMT3018 


pMT3217 Y548F 


This study 


pMT 3068 


PRS 314-CDC4 R572A 


Nash et. al 


pMT3019 


pMT3217 Y574F 


This study 


pMT 3020 


pMT3217 W717N 


This study 


pMT3021 


pMT3217 V384N+W717N 


This study 


pMT 3022 


PMT3217K402A+R443D 


This study 


pMT 3024 


pMT3000 Helix 5 A 330-443 


This study 


pMT 3025 


pMT3000 Helix 5 breaker 


This study 


pMT 3026 


pMT3000 Helix 5 A 321-360 


This study 


pMT 3027 


pMT 3000 Helix 6 breaker 


This study 


pMT 3028 


pMT 3000 Alal insert 


This study 


pMT 3029 


pMT 3000 Ala2 insert 


This study 


pMT 3030 


pMT 3000 AIa3 insert 


This study 


pMT3031 


pMT 3000 Ala4 insert 


This study 


pMT 3032 


pMT 3000 Ala8 insert 


This study 


pMT 3033 


pMT 3000 Alal 2 insert 


This study 


pMT 3034 


pMT 3217 Helix 5 A 330-443 


This study 


pMT 3035 


pMT 32 17 Helix 5 breaker 


This study 


pMT 3036 


pMT 3217 Helix 5 A 321-360 


This study 
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Plasmirl 


Relevant Characteristic 




pMT 3037 


pMT 3217 Helix 6 breaker 


This study 


pMT 3038 


pMT3217 Alal insert 


This study 


pMT 3039 


pMT3217 Ala2 insert 


This study 


pMT 3040 


pMT3217 AIa3 insert 


This study 


pMT3041 


pMT 3217 Ala4 insert 
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